{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Build genre dataset\n",
    "\n",
    "This notebook documents the creation of the multi-genre dataset used in \"Ascribing Historical Significance to Textual Similarity.\"\n",
    "\n",
    "It begins with some exploration of the \"subject\" and \"genre\" fields in a deduplicated dataset of fiction in HathiTrust. The process of deduplication is more fully documented [in a different repository.](https://github.com/tedunderwood/noveltmmeta)\n",
    "\n",
    "After surveying the landscape, I select certain subject and genre categories for the experiment. The sampling process is complicated, because I want to ensure that subsequent comparisons are able to compare non-overlapping sets.\n",
    "\n",
    "Finally, HathiTrust metadata is used to define a rough measure of the social proximity between pairs of genres.\n",
    "\n",
    "**Note to literary critics:** For the purpose of this experiment, I am *provisionally* borrowing the \"subject\" and \"genre\" categories librarians have used. A whole book could be written investigating the history of those categories, and arguing about their suitability. In other projects, I have used different genre definitions, borrowed from critics or bibliographers; in the future, I plan to use categories defined by the practice of book reviewers. The whole point of quantitative research on genre is that we don't have to be content with any particular set of categories, but are free to compare multiple perspectives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from collections import Counter\n",
    "import random, csv, math\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "meta = pd.read_csv('../../noveltmmeta/workmeta.tsv', sep = '\\t', low_memory = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the most common genres\n",
    "\n",
    "The status of genre in library metadata is somewhat vexed. I won't have space to explore the topic fully here, but readers should be aware of a few pitfalls.\n",
    "\n",
    "For instance, genre categories were not systematically applied to library records until quite late in the 20th century. So our dataset contains many records, especially earlier records, that lack genre categories altogether. Moreover, specific categories often displace general ones. Don't be surprised, for instance, that in a dataset with more than 100,000 novels, only 3301 bear an explicit designation \"Novel.\"\n",
    "\n",
    "The metadata used here was originally contained in MARC records, and then transformed using [a script written by Ted Underwood and Michael L. Black.](https://github.com/tedunderwood/genredistance/blob/master/select_data/scrape_marc.py) Generally the genre tags reported below are drawn from MARC fields 655 or 155, and were originally chosen by catalogers from a list of [Library of Congress Genre / Form Terms.](https://www.loc.gov/catdir/cpso/genre_form_faq.pdf) However, a few of the larger categories below reflect information in the MARC header field. Some of these will seem paradoxical: for instance, why does a dataset of fiction contain 41,743 records tagged \"Not Fiction\"? That tag just reflects the absence of an explicit \"fiction\" marker in the header."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "allgenres = Counter()\n",
    "for idx, row in meta.iterrows():\n",
    "    genres = row.genres\n",
    "    if pd.isnull(genres):\n",
    "        continue\n",
    "    else:\n",
    "        genres = genres.split('|')\n",
    "    \n",
    "    for g in genres:\n",
    "        allgenres[g] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('Fiction', 76816),\n",
       " ('NotFiction', 41743),\n",
       " ('UnknownGenre', 5845),\n",
       " ('Bibliographies', 4424),\n",
       " ('Novel', 3301),\n",
       " ('Juvenile audience', 3071),\n",
       " ('Short stories', 1332),\n",
       " ('Domestic fiction', 1117),\n",
       " ('Love stories', 980),\n",
       " ('Historical fiction', 950),\n",
       " ('Psychological fiction', 902),\n",
       " ('ContainsBiogMaterial', 889),\n",
       " ('Detective and mystery stories', 691),\n",
       " ('Mystery fiction', 658),\n",
       " ('Autobiography', 459),\n",
       " ('Science fiction', 376),\n",
       " ('Suspense fiction', 371),\n",
       " ('Biography', 297),\n",
       " ('NotBiographical', 290),\n",
       " ('Bildungsromans', 277),\n",
       " (\"Publishers' advertisements\", 275),\n",
       " ('Humorous stories', 229),\n",
       " ('Bildungsromane', 219),\n",
       " ('Adventure stories', 210),\n",
       " ('War stories', 207),\n",
       " ('Biographical fiction', 197),\n",
       " ('Fantasy fiction', 185),\n",
       " ('Humorous fiction', 185),\n",
       " ('Political fiction', 177),\n",
       " ('Mixed', 166),\n",
       " ('Western stories', 149),\n",
       " ('Fantastic fiction', 134),\n",
       " ('Christian fiction', 124),\n",
       " ('Juvenile literature', 124),\n",
       " ('Satire', 100),\n",
       " ('Horror fiction', 99),\n",
       " ('Juvenile works', 94),\n",
       " ('Legal stories', 93),\n",
       " ('History', 91),\n",
       " ('Spy stories', 84),\n",
       " ('Adventure fiction', 81),\n",
       " (\"Publishers' cloth bindings (Binding)\", 79),\n",
       " ('Autobiographical fiction', 77),\n",
       " ('Jewish fiction', 75),\n",
       " ('Black humor (Literature)', 75),\n",
       " ('New York', 63),\n",
       " ('Romantic suspense fiction', 62),\n",
       " ('Horror tales', 62),\n",
       " ('New York (State)', 61),\n",
       " ('Medical novels', 57),\n",
       " ('Mystery and detective fiction', 56),\n",
       " ('Bookplates (Provenance)', 55),\n",
       " ('Poetry', 55),\n",
       " ('Occult fiction', 54),\n",
       " ('Romantic suspense novels', 53),\n",
       " ('Allegories', 52),\n",
       " ('Sea stories', 52),\n",
       " ('Didactic fiction', 49),\n",
       " ('Religious fiction', 48),\n",
       " ('Erotic stories', 46),\n",
       " ('Catalog', 43),\n",
       " ('Imaginary voyages', 39),\n",
       " ('Dime novels', 38),\n",
       " ('Typefaces (Type evidence)', 37),\n",
       " ('United States', 35),\n",
       " ('Presentation inscriptions (Provenance)', 35),\n",
       " ('Musical fiction', 35),\n",
       " ('Indexes', 34),\n",
       " ('England', 33),\n",
       " ('Epistolary fiction', 33),\n",
       " ('Ghost stories', 31),\n",
       " ('Ink stamps (Provenance)', 30),\n",
       " ('Autographs (Provenance)', 29),\n",
       " ('Erotic fiction', 29),\n",
       " ('College stories', 29)]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "genres = allgenres.most_common()\n",
    "genres[0:75]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select groups of novels carrying specific genre tags\n",
    "\n",
    "I select twenty genres, guided partly by an attempt to cover a range of different types of genres, and partly by a need to select genres with more than 100 examples in the dataset.\n",
    "\n",
    "In some cases, I'm going to want to fold several different terms into a single category, so this is expressed as a dictionary where the key is a short term I plan to use, and the value is a set of terms I'll be fishing for in the dataset and equating to the key. In most cases, I think these decisions about synonymy are pretty transparent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "genre_categories = {'Novel': {'Novel'},\n",
    "              'Juvenile': {'Juvenile audience'},\n",
    "              'Short stories': {'Short stories'},\n",
    "              'Domestic': {'Domestic fiction'},\n",
    "              'Love': {'Love stories'},\n",
    "              'Historical': {'Historical fiction'},\n",
    "              'Psychological': {'Psychological fiction'},\n",
    "              'Mystery': {'Detective and mystery stories', 'Mystery and detective fiction', 'Mystery fiction'},\n",
    "              'Suspense': {'Suspense fiction'},\n",
    "              'SF': {'Science fiction'},\n",
    "              'Bildungsroman': {'Bildungsromans', 'Bildungsromane'},\n",
    "              'Biographical': {'Biographical fiction'},\n",
    "              'Humor': {'Humorous stories', 'Humorous fiction'},\n",
    "              'Fantasy': {'Fantasy fiction', 'Fantastic fiction'},\n",
    "              'Horror': {'Horror tales', 'Horror fiction', 'Occult fiction'},\n",
    "              'Western': {'Western stories'},\n",
    "              'Political': {'Political fiction'},\n",
    "              'War': {'War stories'},\n",
    "              'Adventure': {'Adventure stories', 'Adventure fiction'},\n",
    "              'Christian': {'Christian fiction'}\n",
    "             }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this stage, we're not actually going to select specific examples of the genres. That needs to be done later, because I'm going to want to be able to guarantee that it's possible to compare genres and subjects to each other without overlap. So I need a complete list of genres, and a complete list of subjects, before selecting examples.\n",
    "\n",
    "For right now, we simply iterate through the list of genre_categories and gather sets of matching volumes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def hasgenre(row, genre2match):\n",
    "    genres = row.genres\n",
    "    date = row.inferreddate\n",
    "    if pd.isnull(genres):\n",
    "        return None\n",
    "    elif pd.isnull(date) or int(date) < 1700 or int(date) > 2010:\n",
    "        return None\n",
    "    else:\n",
    "        genres = genres.split('|')\n",
    "        for g in genres:\n",
    "            if g == genre2match:\n",
    "                return row.docid\n",
    "        return None\n",
    "\n",
    "def gathergenre(genreset):\n",
    "    global meta\n",
    "    allmatches = set()\n",
    "    for g in genreset:\n",
    "        thisset = set(meta.apply(hasgenre, args = ([g]), axis = 1))\n",
    "        thisset.remove(None)\n",
    "        allmatches = allmatches.union(thisset)\n",
    "    return allmatches     "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will store the sets of docids associated with each genre in ```category_dict```."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "category_dict = dict()\n",
    "\n",
    "for name, category in genre_categories.items(): \n",
    "    examples = gathergenre(category)\n",
    "    category_dict[name] = examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### most common subjects\n",
    "\n",
    "Index terms contained in MARC fields 600-699 were counted as \"subjects\" if not specifically identified as \"genre\" terms or as \"geographic\" designations. Note that subject terms were added to MARC records several decades before genre terms came along. So in many 19c or early-20c records, things that we might consider genres are represented instead as subjects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('Fiction', 27190),\n",
       " ('History', 5469),\n",
       " ('Social life and customs', 2876),\n",
       " ('English fiction', 2436),\n",
       " ('Translations into English', 2024),\n",
       " ('United States', 1894),\n",
       " ('20th century', 1727),\n",
       " ('Juvenile fiction', 1325),\n",
       " ('American fiction', 1270),\n",
       " ('World War, 1939-1945', 1222),\n",
       " ('Folklore', 1163),\n",
       " ('England', 1138),\n",
       " ('Women', 1102),\n",
       " ('Description and travel', 994),\n",
       " ('Tales', 851),\n",
       " ('World War, 1914-1918', 793),\n",
       " ('19th century', 770),\n",
       " ('Short stories, American', 706),\n",
       " ('Great Britain', 684),\n",
       " ('Short stories', 647),\n",
       " ('Short stories, English', 642),\n",
       " ('Juvenile literature', 623),\n",
       " ('English literature', 578),\n",
       " ('Indians of North America', 574),\n",
       " ('New York (State)', 567),\n",
       " ('Civil War, 1861-1865', 554),\n",
       " ('Man-woman relationships', 550),\n",
       " ('African Americans', 541),\n",
       " ('American literature', 521),\n",
       " ('Gay men', 521),\n",
       " ('California', 511),\n",
       " ('France', 499),\n",
       " ('Fairy tales', 455),\n",
       " ('New York', 437),\n",
       " ('Lesbians', 436),\n",
       " ('Women authors', 424),\n",
       " ('Science fiction, American', 422),\n",
       " ('Young women', 420),\n",
       " ('Legends', 412),\n",
       " ('Conduct of life', 410),\n",
       " ('Literary collections', 409),\n",
       " ('Families', 400),\n",
       " ('Biography', 398),\n",
       " ('Social conditions', 392),\n",
       " ('India', 381),\n",
       " ('Jews', 369),\n",
       " ('Police', 336),\n",
       " ('Americans', 335),\n",
       " ('Mothers and daughters', 331),\n",
       " ('Frontier and pioneer life', 326),\n",
       " ('London', 326),\n",
       " ('Private investigators', 318),\n",
       " ('Young men', 305),\n",
       " ('Children', 297),\n",
       " ('Anecdotes', 296),\n",
       " ('Ireland', 292),\n",
       " ('Murder', 291),\n",
       " ('City and town life', 287),\n",
       " ('Collections', 282),\n",
       " ('Death', 280),\n",
       " ('Animals', 278),\n",
       " ('IsBiographical', 251),\n",
       " ('Revolution, 1775-1783', 251),\n",
       " ('Detective and mystery stories', 246),\n",
       " ('Literature', 245),\n",
       " ('Readers', 244),\n",
       " ('Personal narratives', 240),\n",
       " ('Politics and government', 236),\n",
       " ('fast', 235),\n",
       " ('Friendship', 234),\n",
       " (\"Children's stories, American\", 233),\n",
       " ('American wit and humor', 231),\n",
       " ('Fathers and sons', 226),\n",
       " ('China', 220),\n",
       " ('Boys', 218),\n",
       " ('Girls', 215),\n",
       " ('Australia', 213),\n",
       " ('Scotland', 209),\n",
       " ('Science fiction', 208),\n",
       " ('Italy', 208),\n",
       " ('Country life', 205),\n",
       " ('Horror tales, American', 201),\n",
       " ('Dogs', 198),\n",
       " ('Germany', 197),\n",
       " (\"Children's stories\", 194),\n",
       " ('Sisters', 193),\n",
       " ('African American women', 191),\n",
       " ('Vietnam War, 1961-1975', 191),\n",
       " ('Fathers and daughters', 190),\n",
       " ('Fantasy fiction, American', 184),\n",
       " ('Gay erotic literature', 183),\n",
       " ('Detective and mystery stories, American', 177),\n",
       " ('Colonial period, ca. 1600-1775', 175),\n",
       " ('History and criticism', 172),\n",
       " ('Teenage girls', 171),\n",
       " ('British', 169),\n",
       " ('Marriage', 168),\n",
       " ('Travel', 167),\n",
       " ('Hunting', 167),\n",
       " ('Voyages and travels', 164)]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "allsubjects = Counter()\n",
    "for idx, row in meta.iterrows():\n",
    "    genres = row.subjects\n",
    "    if pd.isnull(genres):\n",
    "        continue\n",
    "    else:\n",
    "        genres = genres.split('|')\n",
    "    \n",
    "    for g in genres:\n",
    "        allsubjects[g] += 1\n",
    "\n",
    "subjects = allsubjects.most_common()\n",
    "subjects[0:100]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "subjects2find = {'Subj: Detective': {'Women detectives', 'Detective and mystery stories', 'Detective stories', \n",
    "                                     'Mystery and detective stories', 'Detective and mystery stories, French', \n",
    "                                     'Detective and mystery stories, New Zealand', 'Detective and mystery stories, Danish', \n",
    "                                     'Detective and mystery stories, Scottish', 'Detective and mystery stories, English', \n",
    "                                     'Detective and mystery stories, American'},\n",
    "                 'Subj: Man-woman': {'Man-woman relationships', 'Marriage'}, \n",
    "                 'Subj: SF, Other': {'Science fiction, French', 'Science fiction, Canadian', \n",
    "                              'Science fiction, Polish', 'Science fiction, Australian', \n",
    "                              'Science fiction, Russian', 'Science fiction, English', \n",
    "                              'Science fiction'},\n",
    "                 'Subj: SF, American': {'Science fiction, American'},\n",
    "                 'Subj: Short stories, Other': {'Short stories, Icelandic', 'Short stories, Nepali',\n",
    "                                                'Short stories, Serbian', 'Short stories, Turkish', \n",
    "                                                'Short stories, Ghanaian (English)', 'Short stories, Indic (English)',\n",
    "                                                'Short stories in Russian, 1917-1945 - English texts', \n",
    "                                                'Short stories in French, 1900-1945 - English texts', \n",
    "                                                'Short stories, Ukrainian', 'Short stories, Caribbean (English)', \n",
    "                                                'Short stories, Sindhi', 'Short stories, Urdu', 'Short stories, Arabic', \n",
    "                                                'Short stories, Vietnamese', 'Short stories, Lithuanian', \n",
    "                                                'Short stories, African (English)', 'Short stories, Malaysian', \n",
    "                                                'Short stories, Australian', 'Short stories, Pakistani', \n",
    "                                                'Short stories, Irish', 'Short stories, South African (English)', \n",
    "                                                'Short stories, Bulgarian', 'Short stories, Panjabi', \n",
    "                                                'Short stories, Canadian', 'Short stories, New Zealand', \n",
    "                                                'Short stories, Norwegian', 'Short stories, Dutch', \n",
    "                                                'Short stories, Malay', 'Short stories, Tarascan', \n",
    "                                                'Short stories, English', 'Short stories, South African'},\n",
    "                 'Subj: Short stories, American': {'Short stories, American'},\n",
    "                 'Subj: Fairy tales': {'Fairy tales, American', 'Fairy tales, Scottish',\n",
    "                                       'Fairy tales.', 'Fairy tales, Japanese', \n",
    "                                       'Fairy tales, German', 'Fairy tales, English', \n",
    "                                       'Fairy tales, French', 'Fairy tales'}, \n",
    "                 \"Subj: Fantasy\": {'Fantasy fiction, Scottish', 'Fantasy games', 'Fantasy fiction, Chinese',\n",
    "                                   'Fantasy', 'Fantasy fiction, Yiddish', 'Fantasy fiction, French', \n",
    "                                   'Fantasy fiction', 'Fantasy fiction, Russian', 'Fantasy fiction, English',\n",
    "                                   'Fantasy fiction, American', 'Fantasy fiction, Romanian'},\n",
    "                 'Subj: Horror': {'Horror short stories in English, 1837-1945 - Anthologies', \n",
    "                                  'Horror tales, Singaporean', 'Horror & ghost stories', 'Horror tales',\n",
    "                                  'Horror tales, Irish',  'Horror tales, American', 'Horror tales, Scottish',\n",
    "                                  'Horror stories', 'Horror tales, Canadian', 'Horror tales, English',\n",
    "                                  'Ghost stories', 'Ghosts'},\n",
    "                 'Subj: History': {'History'},\n",
    "                 'Subj: Humor': {'English wit and humor', 'Humor',\n",
    "                          'American wit and humor', 'Humorous stories, American', 'Humorous stories'},\n",
    "                 'Subj: Juvenile': {\"Juvenile literature\", \"Juvenile fiction\", \"Children's stories\",\n",
    "                                   \"Children's stories, American\"}\n",
    "                }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "def hassubject(row, subj2match):\n",
    "    ''' A little different from hasgenre, because it's looking\n",
    "    for a phrase *in* a subject rather than attempting an exact\n",
    "    match.\n",
    "    '''\n",
    "    subjects = row.subjects\n",
    "    if pd.isnull(subjects) or pd.isnull(row.inferreddate):\n",
    "        return None\n",
    "    elif int(row.inferreddate) < 1700 or int(row.inferreddate) > 2010:\n",
    "        return None\n",
    "    else:\n",
    "        subjects = subjects.split('|')\n",
    "        for s in subjects:\n",
    "            if s in subj2match:\n",
    "                return row.docid\n",
    "        \n",
    "    return None\n",
    "\n",
    "def gathersubject(subj):\n",
    "    global meta\n",
    "    thisset = set(meta.apply(hassubject, args = ([subj]), axis = 1))\n",
    "    return thisset  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Subj: Man-woman\n",
      "Subj: Detective\n",
      "Subj: Short stories, Other\n",
      "Subj: Fairy tales\n",
      "Subj: Humor\n",
      "Subj: Horror\n",
      "Subj: Short stories, American\n",
      "Subj: Juvenile\n",
      "Subj: History\n",
      "Subj: SF, Other\n",
      "Subj: SF, American\n",
      "Subj: Fantasy\n"
     ]
    }
   ],
   "source": [
    "meta.reset_index(inplace = True)\n",
    "for name, category in subjects2find.items():\n",
    "    print(name)\n",
    "    examples = gathersubject(category)\n",
    "    examples.remove(None)\n",
    "    category_dict[name] = examples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "# switch index for easier access\n",
    "\n",
    "meta.set_index('docid', inplace = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Short stories   |   1332   |   1985.43\n",
      "Subj: Humor   |   449   |   1922.93\n",
      "Suspense   |   371   |   2003.47\n",
      "Subj: Detective   |   650   |   1974.81\n",
      "Domestic   |   1117   |   1992.33\n",
      "Christian   |   124   |   1975.17\n",
      "Horror   |   205   |   1989.45\n",
      "Juvenile   |   3071   |   1894.53\n",
      "Historical   |   950   |   1986.11\n",
      "Subj: Juvenile   |   2228   |   1909.9\n",
      "Fantasy   |   317   |   1992.12\n",
      "Subj: SF, Other   |   331   |   1970.75\n",
      "Subj: Man-woman   |   694   |   1972.54\n",
      "Adventure   |   288   |   1981.65\n",
      "Humor   |   406   |   1991.66\n",
      "Love   |   980   |   1982.12\n",
      "Novel   |   3301   |   1985.96\n",
      "Western   |   149   |   1970.6\n",
      "Political   |   177   |   1985.12\n",
      "SF   |   375   |   1988.16\n",
      "Subj: Horror   |   445   |   1975.04\n",
      "Bildungsroman   |   490   |   1985.69\n",
      "Subj: Short stories, American   |   706   |   1969.27\n",
      "Subj: History   |   5090   |   1930.99\n",
      "Subj: Fantasy   |   360   |   1978.74\n",
      "Subj: Fairy tales   |   468   |   1923.91\n",
      "Subj: Short stories, Other   |   1047   |   1974.07\n",
      "War   |   207   |   1967.19\n",
      "Psychological   |   902   |   1996.89\n",
      "Subj: SF, American   |   421   |   1981.1\n",
      "Biographical   |   197   |   1988.79\n",
      "Mystery   |   1377   |   1986.73\n"
     ]
    }
   ],
   "source": [
    "with open('summary_of_available_genres.tsv', mode = 'w', encoding = 'utf-8') as f:\n",
    "    writer = csv.DictWriter(f, delimiter = '\\t', fieldnames = ['genre', 'numvols', 'meandate'])\n",
    "    writer.writeheader()\n",
    "    for catname, examples in category_dict.items():\n",
    "        dates = meta.loc[examples, 'inferreddate']\n",
    "        print(catname, '  |  ', len(examples), '  |  ', round(np.mean(dates), 2))\n",
    "        outrow = dict()\n",
    "        outrow['genre'] = catname\n",
    "        outrow['numvols'] = len(examples)\n",
    "        outrow['meandate'] = round(np.mean(dates), 2)\n",
    "        writer.writerow(outrow)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we've got 32 categories: 20 genres and 12 subjects.\n",
    "\n",
    "### Selecting examples of the categories\n",
    "\n",
    "**Basic samples.** First, we simply select 102 random examples of each category. (We're going to use 100-text samples, but it's wise to leave a little extra room in case one text turns out to be defective / missing.)\n",
    "\n",
    "**Intersectional samples.** But we also want to be able to ensure non-overlapping comparisons if needed. So we also compare each category to all the others, and where there are intersections, we create extra categories like\n",
    "\n",
    "    Horror-Not-Humor\n",
    "\n",
    "With enough extra volumes to ensure non-overlap.\n",
    "\n",
    "**B samples.**\n",
    "\n",
    "In genres where we have enough examples to go around, we also create a second \"B sample,\" to permit self-comparison. These have to be non-overlapping with the A sample, but we don't do the whole intersectional song and dance for the B samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Basic samples selected.\n",
      "\n",
      "Intersectional samples:\n",
      "-----------------------\n",
      "Short stories Fantasy 1\n",
      "Short stories Subj: SF, Other 1\n",
      "Short stories Subj: Man-woman 1\n",
      "Short stories Subj: Horror 3\n",
      "Short stories Subj: Short stories, American 1\n",
      "Short stories Subj: Short stories, Other 1\n",
      "Short stories Subj: SF, American 2\n",
      "Short stories Subj: Fantasy 1\n",
      "Christian Domestic 1\n",
      "Christian Fantasy 5\n",
      "Christian Humor 1\n",
      "Christian Biographical 5\n",
      "Christian SF 1\n",
      "Christian Psychological 1\n",
      "Suspense Political 1\n",
      "Suspense Novel 1\n",
      "Suspense Bildungsroman 1\n",
      "Subj: Detective Horror 1\n",
      "Subj: Detective Juvenile 1\n",
      "Subj: Detective Historical 1\n",
      "Subj: Detective Subj: Juvenile 1\n",
      "Subj: Detective Adventure 1\n",
      "Subj: Detective Mystery 1\n",
      "Domestic Christian 1\n",
      "Domestic Subj: Man-woman 1\n",
      "Domestic Adventure 1\n",
      "Domestic Novel 1\n",
      "Domestic Love 1\n",
      "Domestic Bildungsroman 1\n",
      "Domestic War 1\n",
      "Domestic Psychological 2\n",
      "Subj: Humor Subj: SF, Other 1\n",
      "Horror Subj: Detective 1\n",
      "Horror Fantasy 2\n",
      "Horror Subj: SF, Other 1\n",
      "Horror Western 1\n",
      "Horror Love 1\n",
      "Horror SF 1\n",
      "Horror Subj: Horror 2\n",
      "Horror Mystery 1\n",
      "Horror Psychological 1\n",
      "Horror Subj: SF, American 1\n",
      "Juvenile Subj: Detective 1\n",
      "Juvenile Subj: Juvenile 2\n",
      "Historical Subj: Detective 1\n",
      "Historical Subj: Man-woman 1\n",
      "Historical Adventure 1\n",
      "Historical Political 1\n",
      "Historical Biographical 3\n",
      "Historical Novel 1\n",
      "Historical Love 1\n",
      "Historical Subj: History 1\n",
      "Historical War 4\n",
      "Subj: Juvenile Subj: Detective 1\n",
      "Subj: Juvenile Juvenile 2\n",
      "Subj: Juvenile Subj: SF, Other 1\n",
      "Subj: Juvenile Subj: History 1\n",
      "Subj: Juvenile Subj: Fairy tales 1\n",
      "Fantasy Short stories 1\n",
      "Fantasy Christian 5\n",
      "Fantasy Horror 2\n",
      "Fantasy Adventure 1\n",
      "Fantasy Novel 1\n",
      "Fantasy SF 1\n",
      "Fantasy Subj: Horror 2\n",
      "Fantasy War 2\n",
      "Fantasy Psychological 1\n",
      "Fantasy Subj: Fairy tales 1\n",
      "Subj: SF, Other Short stories 1\n",
      "Subj: SF, Other Subj: Humor 1\n",
      "Subj: SF, Other Horror 1\n",
      "Subj: SF, Other Subj: Juvenile 1\n",
      "Subj: SF, Other SF 3\n",
      "Subj: SF, Other Subj: Short stories, American 1\n",
      "Subj: SF, Other Subj: SF, American 1\n",
      "Subj: SF, Other Subj: Fantasy 1\n",
      "Subj: Man-woman Short stories 1\n",
      "Subj: Man-woman Domestic 1\n",
      "Subj: Man-woman Historical 1\n",
      "Subj: Man-woman Adventure 1\n",
      "Subj: Man-woman Political 1\n",
      "Subj: Man-woman Novel 1\n",
      "Subj: Man-woman Love 3\n",
      "Subj: Man-woman Bildungsroman 1\n",
      "Subj: Man-woman Subj: Fairy tales 1\n",
      "Subj: Man-woman Subj: Fantasy 1\n",
      "Adventure Subj: Detective 1\n",
      "Adventure Domestic 1\n",
      "Adventure Historical 1\n",
      "Adventure Fantasy 1\n",
      "Adventure Subj: Man-woman 1\n",
      "Adventure Political 1\n",
      "Adventure Biographical 1\n",
      "Adventure Novel 1\n",
      "Adventure Western 2\n",
      "Adventure Bildungsroman 1\n",
      "Adventure War 2\n",
      "Adventure Psychological 3\n",
      "Humor Christian 1\n",
      "Humor Political 1\n",
      "Humor Novel 1\n",
      "Humor Bildungsroman 3\n",
      "Humor Mystery 1\n",
      "Humor Psychological 1\n",
      "Political Suspense 1\n",
      "Political Historical 1\n",
      "Political Subj: Man-woman 1\n",
      "Political Adventure 1\n",
      "Political Humor 1\n",
      "Political Novel 1\n",
      "Political Love 2\n",
      "Political Bildungsroman 2\n",
      "Political Mystery 1\n",
      "Biographical Christian 5\n",
      "Biographical Historical 3\n",
      "Biographical Adventure 1\n",
      "Biographical Western 4\n",
      "Biographical Subj: History 4\n",
      "Biographical War 2\n",
      "Novel Suspense 1\n",
      "Novel Domestic 1\n",
      "Novel Historical 1\n",
      "Novel Fantasy 1\n",
      "Novel Subj: Man-woman 1\n",
      "Novel Adventure 1\n",
      "Novel Humor 1\n",
      "Novel Political 1\n",
      "Novel SF 3\n",
      "Novel Subj: Horror 1\n",
      "Novel Bildungsroman 2\n",
      "Novel Subj: History 1\n",
      "Novel Mystery 1\n",
      "Western Horror 1\n",
      "Western Adventure 2\n",
      "Western Biographical 4\n",
      "Western Love 2\n",
      "Western War 1\n",
      "Love Domestic 1\n",
      "Love Horror 1\n",
      "Love Historical 1\n",
      "Love Subj: Man-woman 3\n",
      "Love Political 2\n",
      "Love Western 2\n",
      "Love Bildungsroman 1\n",
      "Love War 2\n",
      "Love Psychological 1\n",
      "SF Christian 1\n",
      "SF Horror 1\n",
      "SF Fantasy 1\n",
      "SF Subj: SF, Other 3\n",
      "SF Novel 3\n",
      "SF Subj: SF, American 1\n",
      "Subj: Horror Short stories 3\n",
      "Subj: Horror Horror 2\n",
      "Subj: Horror Fantasy 2\n",
      "Subj: Horror Novel 1\n",
      "Subj: Horror Subj: Short stories, Other 1\n",
      "Subj: Horror Subj: SF, American 1\n",
      "Subj: Horror Subj: Fantasy 5\n",
      "Bildungsroman Suspense 1\n",
      "Bildungsroman Domestic 1\n",
      "Bildungsroman Subj: Man-woman 1\n",
      "Bildungsroman Adventure 1\n",
      "Bildungsroman Humor 3\n",
      "Bildungsroman Political 2\n",
      "Bildungsroman Novel 2\n",
      "Bildungsroman Love 1\n",
      "Bildungsroman War 1\n",
      "Subj: Short stories, American Short stories 1\n",
      "Subj: Short stories, American Subj: SF, Other 1\n",
      "Subj: History Historical 1\n",
      "Subj: History Subj: Juvenile 1\n",
      "Subj: History Biographical 4\n",
      "Subj: History Novel 1\n",
      "Subj: History War 2\n",
      "Mystery Subj: Detective 1\n",
      "Mystery Horror 1\n",
      "Mystery Humor 1\n",
      "Mystery Political 1\n",
      "Mystery Novel 1\n",
      "Subj: Short stories, Other Short stories 1\n",
      "Subj: Short stories, Other Subj: Horror 1\n",
      "War Domestic 1\n",
      "War Historical 4\n",
      "War Fantasy 2\n",
      "War Adventure 2\n",
      "War Biographical 2\n",
      "War Western 1\n",
      "War Love 2\n",
      "War Bildungsroman 1\n",
      "War Subj: History 2\n",
      "Psychological Christian 1\n",
      "Psychological Domestic 2\n",
      "Psychological Horror 1\n",
      "Psychological Fantasy 1\n",
      "Psychological Adventure 3\n",
      "Psychological Humor 1\n",
      "Psychological Love 1\n",
      "Subj: SF, American Short stories 2\n",
      "Subj: SF, American Horror 1\n",
      "Subj: SF, American Subj: SF, Other 1\n",
      "Subj: SF, American SF 1\n",
      "Subj: SF, American Subj: Horror 1\n",
      "Subj: SF, American Subj: Fantasy 5\n",
      "Subj: Fairy tales Subj: Juvenile 1\n",
      "Subj: Fairy tales Fantasy 1\n",
      "Subj: Fairy tales Subj: Man-woman 1\n",
      "Subj: Fairy tales Subj: Fantasy 3\n",
      "Subj: Fantasy Short stories 1\n",
      "Subj: Fantasy Subj: SF, Other 1\n",
      "Subj: Fantasy Subj: Man-woman 1\n",
      "Subj: Fantasy Subj: Horror 5\n",
      "Subj: Fantasy Subj: SF, American 5\n",
      "Subj: Fantasy Subj: Fairy tales 3\n",
      "\n",
      "Intersectional samples selected.\n",
      "\n",
      "B samples:\n",
      "----------\n",
      "Short stories B\n",
      "Subj: Humor B\n",
      "Suspense B\n",
      "Subj: Detective B\n",
      "Domestic B\n",
      "Juvenile B\n",
      "Historical B\n",
      "Subj: Juvenile B\n",
      "Fantasy B\n",
      "Subj: SF, Other B\n",
      "Subj: Man-woman B\n",
      "Humor B\n",
      "Love B\n",
      "Novel B\n",
      "SF B\n",
      "Subj: Horror B\n",
      "Bildungsroman B\n",
      "Subj: Short stories, American B\n",
      "Subj: History B\n",
      "Subj: Fantasy B\n",
      "Subj: Fairy tales B\n",
      "Subj: Short stories, Other B\n",
      "Psychological B\n",
      "Subj: SF, American B\n",
      "Mystery B\n"
     ]
    }
   ],
   "source": [
    "genredist = dict()\n",
    "\n",
    "def get_pure_examples(catname1, catname2, catset1, catset2, category_dict, overlaplen):\n",
    "    only1 = category_dict[catname1] - category_dict[catname2]\n",
    "    # Let's sample from examples of category 1 known not to be in\n",
    "    # category 2\n",
    "    \n",
    "    only1 = only1 - catset1\n",
    "    # Because these are supposed to be additional examples, not ones\n",
    "    # already in the category\n",
    "    \n",
    "    onenottwo = random.sample(only1, overlaplen)\n",
    "    return onenottwo\n",
    "\n",
    "# first, we select the basic genre samples\n",
    "\n",
    "for catname, examples in category_dict.items():\n",
    "    \n",
    "    chosen = random.sample(examples, 102)\n",
    "    genredist[catname] = set(chosen)\n",
    "        \n",
    "print()\n",
    "print('Basic samples selected.')\n",
    "print()\n",
    "print('Intersectional samples:')\n",
    "print('-----------------------')\n",
    "\n",
    "# then create special intersectional categories\n",
    "# to permit non-overlapping comparisons\n",
    "\n",
    "toadd = dict()\n",
    "for g1, gset1 in genredist.items():\n",
    "    for g2, gset2 in genredist.items():\n",
    "        if g1 == g2:\n",
    "            continue\n",
    "        \n",
    "        overlap = len(gset1.intersection(gset2))\n",
    "        if overlap > 0:\n",
    "            print(g1, g2, overlap)\n",
    "            only1 = get_pure_examples(g1, g2, gset1, gset2, category_dict, overlap)\n",
    "            name1not2 = g1 + '-Not-' + g2\n",
    "            toadd[name1not2] = only1\n",
    "            \n",
    "            only2 = get_pure_examples(g2, g1, gset2, gset1, category_dict, overlap)\n",
    "            name2not1 = g2 + '-Not-' + g1\n",
    "            toadd[name2not1] = only2\n",
    "\n",
    "for key, value in toadd.items():\n",
    "    genredist[key] = value\n",
    "    \n",
    "print()            \n",
    "print('Intersectional samples selected.')\n",
    "print()\n",
    "print('B samples:')\n",
    "print('----------')\n",
    "# Then we select \"B\" samples in cases where there are enough instances\n",
    "# to make this practical.\n",
    "\n",
    "for catname, examples in category_dict.items():\n",
    "    \n",
    "    chosen = genredist[catname]\n",
    "    \n",
    "    if len(examples) > 300:\n",
    "        remainder = examples - set(chosen)\n",
    "        nameB = catname + ' B'\n",
    "        chosenB = random.sample(remainder, 102)\n",
    "        genredist[nameB] = chosenB\n",
    "        print(nameB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5585\n"
     ]
    }
   ],
   "source": [
    "allvols = set()\n",
    "for name, genreset in genredist.items():\n",
    "    allvols = allvols.union(genreset)\n",
    "print(len(allvols))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exploratory analysis of selected volumes\n",
    "\n",
    "Just to get a sense of what we're looking at.\n",
    "\n",
    "The cell immediately below also importantly creates the dictionary ```genresfordocs```, which will be used a lot below. This is a dict where each key is a volume ID and the value is a set of genre tags possessed by the volume."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "211\n",
      "5585\n",
      "\n",
      "690\n"
     ]
    }
   ],
   "source": [
    "# Now let's get a sense of how these volumes are distributed on the timeline\n",
    "genresfordocs = dict()\n",
    "for genrename, docsingenre in genredist.items():\n",
    "    for docid in docsingenre:\n",
    "        if docid not in genresfordocs:\n",
    "            genresfordocs[docid] = set()\n",
    "        genresfordocs[docid].add(genrename)\n",
    "\n",
    "datedist = Counter()\n",
    "datebygenre = dict()\n",
    "allgenrenames = set()\n",
    "\n",
    "for docid, genreset in genresfordocs.items():\n",
    "    date = int(meta.loc[docid, 'inferreddate'])\n",
    "    datedist[date] += 1\n",
    "    for g in genreset:\n",
    "        allgenrenames.add(g)\n",
    "        if g not in datebygenre:\n",
    "            datebygenre[g] = Counter()\n",
    "        datebygenre[g][date] += 1\n",
    "\n",
    "print(len(datedist))\n",
    "print(sum(datedist.values()))\n",
    "print()\n",
    "\n",
    "maxforyear = dict()\n",
    "for d in datedist.keys():\n",
    "    maximum = 0\n",
    "    for g in allgenrenames:\n",
    "        if datebygenre[g][d] > maximum:\n",
    "            maximum = datebygenre[g][d]\n",
    "    maxforyear[d] = maximum\n",
    "print(sum(maxforyear.values()))  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### maximum number of vols in a year\n",
    "\n",
    "This is not all the volumes, but just the maximum in any single genre. It's an important figure because we're going to use this to guarantee that our random datasets have enough volumes to create a date-matching contrast set for any genre."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.collections.PathCollection at 0x1204679b0>"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEACAYAAACwB81wAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHL9JREFUeJzt3X+MHPd53/H3c8c7ckWKpBSeyUAkb239AG3ELEW3RFKr\n0RIxTTe/5FiAbbVA5VRw7ca01DZAHRsBSARFERmBEP9DGI2PhRyUJ7ZuU9lBYjuGuSjcxiETUaZs\n0rIdeylKNu9WsSWYKlUy5NM/dpY7OzuzP+7mbmf3+3kBi5v9zq/vfHf5cO67zz5n7o6IiIy/iWF3\nQEREVocCvohIIBTwRUQCoYAvIhIIBXwRkUAo4IuIBKLvgG9m283sq2b2LTN7zsw+GrUfNrMXzeyZ\n6PGuleuuiIgslfWbh29m24Bt7v6smW0A/gZ4AHgf8FN3f2LluikiIsu1pt8N3f0ScClavmxm54E7\notW2An0TEZEcLWkO38zKwB7gr6KmQ2b2rJl9xsw25dQ3ERHJ0cABP5rO+RzwmLtfBo4Cb3L3PTR+\nA9DUjohIAfU9hw9gZmuAPwX+3N0/lbJ+FviCu+9OWaeiPSIiS+DuuUybD3qHfww4Fw/20Ye5Te8B\nvpm1s7uP7OPw4cND70Oo/R/lvqv/w3+Mev/z1PeHtmb2duCfA8+Z2RnAgU8A/8zM9gA3gBrwoVx7\nKCIiuRgkS+d/A5Mpq76YX3dERGSl6Ju2fapUKsPuwrKMcv9Hue+g/g/bqPc/TwN9aLusE5n5ap1L\nRGRcmBk+pA9tRURkRCngi4gEQgFfRCQQCvgiIoFQwBcRCYQCvohIIBTwRUQCoYAvIhIIBXwRkUAo\n4IuIBEIBX0QkEAr4IiKBUMAXEQmEAr6IBKter3P69Gnq9fqwu7IqFPBFJEjz8yeYnd3FgQMfZnZ2\nF/PzJ4bdpRWnevgiEpx6vc7s7C6uXDkJ7AbOUirt58KFbzMzMzPs7rVRPXwRkWWo1WpMT5dpBHuA\n3UxNzVKr1YbXqVWggC8iwSmXy1y9WgPORi1nuXbtAuVyeXidWgUK+CISnJmZGebmjlIq7Wfjxr2U\nSvuZmztauOmcvGkOX0SCVa/XqdVqlMvlwgb7POfwFfBFRApMH9qKiMjAFPBFRAKhgC8iEggFfBGR\nQCjgi4gEou+Ab2bbzeyrZvYtM3vOzB6N2m8zsy+b2fNm9iUz27Ry3RURkaXqOy3TzLYB29z9WTPb\nAPwN8ADwm8DfufsnzexjwG3u/jsp+ystU0RkQENJy3T3S+7+bLR8GTgPbKcR9J+MNnsSeHceHRMR\nkXwtaQ7fzMrAHuDrwFZ3X4DGfwrAG/LqnIiI5GfggB9N53wOeCy600/O02jeRkSkgNYMsrGZraER\n7P/Y3Z+OmhfMbKu7L0Tz/ItZ+x85cuTmcqVSoVKpDNxhEZFxVq1WqVarK3LsgWrpmNlngZfd/d/F\n2h4Hfuzuj+tDWxGRfA2leJqZvR34X8BzNKZtHPgEcAr4r8AO4ALwXnd/JWV/BXwRkQGpWqaISCBU\nLVNEZJnq9TqnT5+mXq8PuyurRgFfRIIzP3+C2dldHDjwYWZndzE/f2LYXVoVmtIRkaDU63VmZ3dx\n5cpJGn/E/Cyl0n4uXPh2If/qlaZ0RESWqFarMT1dphHsAXYzNTVLrVYbXqdWiQK+iASlXC5z9WoN\nOBu1nOXatQuUy+XhdWqVKOCLSFBmZmaYmztKqbSfjRv3UirtZ27uaCGnc/KmOXwRCVK9XqdWq1Eu\nlwsd7JWHLyISCH1oKyIiA1PAFxEJhAK+iEggFPBFRAKhgC8iEggFfBGRQCjgi4gEQgFfRIISYlnk\nJgV8EQlGqGWRm/RNWxEJwqiVRW7SN21FRAYUclnkJgV8EQlCyGWRmxTwRSQIIZdFbtIcvogEZVTK\nIjepPLKISCD0oa2ISE5CystXwBeRYIWWl68pHREJ0qjk5WtKR0RkmULMy1fAF5EghZiXr4AvIkEK\nMS+/7zl8M5sDfhVYcPfdUdth4IPAYrTZJ9z9ixn7aw5fRAqn6Hn5Q8nDN7P7gMvAZxMB/6fu/kQf\n+yvgi0ghFD3Ixw3lQ1t3/xrwk7T+5NEREZHVEFoqZtxAaZlmNgt8IXGH/wHgVeCvgd9291cz9tUd\nvogM1aikYsbleYe/Zpn7HwV+z93dzP4D8ATwSNbGR44cublcqVSoVCrLPL2ISP+aqZhXrnSmYhYl\n4FerVarV6ooce1l3+P2ui9brDl9Ehir0O/xB0zKN2Jy9mW2LrXsP8M08OiUishJCTMWMGyRL5zhQ\nAX4GWAAOA/uBPcANoAZ8yN0XMvbXHb6IFEKoWTqqpSMiUmCqpSMiIgNTwBcRCYQCvohIIBTwRUQC\noYAvIhIIBXwRkUAo4IuIBEIBX0QkEAr4IiKBUMAXEQmEAr6ISCAU8EVEAqGALyISCAV8EZFALPdP\nHIqIDF29XufMmTMA7Nixg8uXL6fWuh+lOvgrQQFfREba/PwJHn74g1y7dh3YBLxCqXQX8BJzc0d5\n6KH33dzukUd+i+npMlev1trWhUJ/AEVERla9Xmfnznt4/XUD/gfwIND592qBkftbtk15/gEU3eGL\nyMiq1WpMTm4F1kePMo2ADrCbqalZarUaANPTZa5c6VxX9ICfJwV8ERlZ5XKZ69cXAANeo/Gntc/S\nvIu/du0C5XIZgKtXs9eFQlk6IjKyZmZmOHbs00xNXQN+BZgCfp5S6a2USvuZmzvKzMwMMzMzzM0d\npVTaz8aNe9vWhURz+CIy8sY5SyfPOXwFfBGRAssz4GtKR0TGUr1e5/Tp09Tr9dTnIVLAF5GxMz9/\ngtnZXRw48GFmZ3fx0Y/+m7bn8/Mnht3FodCUjoiMlXq9nsi5rwK/DHydUcvBB03piIhkqtVqTE+X\naeXjrwd2kJWfHxIFfBEZK+VyOZZzD438/Iux52Hm4IMCvoiMmc6c+wc5dOiDwefgwwBz+GY2B/wq\nsODuu6O224ATwCyNr7i9191fzdhfc/gismqSOfejmIMPQ8rDN7P7gMvAZ2MB/3Hg79z9k2b2MeA2\nd/+djP0V8EWkq6UE5W77jGqQjxvKh7bu/jXgJ4nmB4Ano+UngXfn0SkRCU8ylbKf1Mlu+yzleONu\noLRMM5sFvhC7w/+xu98eW9/2PLGv7vBFJFVnKmXv1Mlu+8DolkNOKnJ55K4R/ciRIzeXK5UKlUol\n59OLyChqplIOUr642z4wuuWQq9Uq1Wp1RY693Dv880DF3RfMbBtw0t3fnLGv7vBFJJXu8LMN84tX\nFj2aPg98IFp+GHg6hz6JSGCWUr642z4qh5xukCyd40AF+BlgATgM/E/gv9H4GtsFGmmZr2Tsrzt8\nEelKWTqdVB5ZRCQQRf7QVkRkaJp39Bs2bMj8IyghU8AXkbEwP3+CRx75LWAzV678iFLpLuAl5uaO\n8tBD7xt29wpBUzoiMvJaGTv/HXgQGP3snCaVRxYRiWmVRF4PlFEp5HQK+CIy8lolkV+jUcdRpZDT\nKOCLyMhr5d0/yLp1G4Gfp1R6q/LvEzSHLyJjYxyzdJSHLyLSp3q9zpkzZwC49957R+4/AH1oKyLS\nh/n5E9xxx50cPPgbHDz4EbZvvzvoMsm6wxeRsVSv19m58x5ef92AKqOapqk7fBGRHmq1GpOTW4E3\nEk/TnJjYHmyapgK+iIylcrnM9esLwA+Ip2neuPFisGmaCvgiMpZmZmY4duzTTE1dA34BuIvp6V8M\nOk1Tc/giMtaUpRM7lgK+iEhxqTyyiIy8fr4kNQ5/wKRINIcvIqtufv4Es7O7uP/+9/OWt7yN++9/\nhNnZXW058s1tDhz4cMc6WRpN6YjIquqnlDGMzx8hXy7l4YvIyOqnlHFrG5U5zpPm8EVkVaWXMm7c\nxcdLGTe2SV8nS6M7fBFZVf2UMm5ts5+NG/eqzHFONIcvIkOhLJ3+KC1TpADyCkbjHtTSri/Z1nze\nlNy+uU7/ISyTu6/Ko3EqkfFw/PhTXird7ps27fVS6XY/fvypoR6nqNKuL9l26NBjN59PTd3q09Ob\nUtclx2fcx64pip35xOG8DtTzRAr4MiYWFxe9VLrd4RsO7vANL5Vu98XFxaEcp6jSrm/dus2JtpMO\npej5osNtGevax2fcxy4uz4CvD21FBpRXyuC4px6mXd/k5BuYmNgRa1sPNJ/XaC9lHF/X2F9pm8uj\ngC8yoFZaYavk7lJSBvM6TlGlXd/164vcuHEx1vYa0Hxepr2UcXxdY//m+Iz72K2YvH5V6PVAUzoy\nRprzxxs33pvLHP5yj1NUadeXbDt06NGbz6emNvj09KbUdVlz+OM6dk3kOKWTS1qmmdWAV4EbwDV3\n35eyjedxLpGiUJZOfwbJ0mneoWetCzFLp3Dlkc3s+8Db3P0nXbZRwBcZUasRWAc5x6jXuB9EEWvp\nWI7HEpECWY2qlYOcY37+BHfccScHD/4GBw9+hO3b71YlzT7leYf/CnAd+E/u/kcp2+gOX2TEtCpb\nrlzVykHOUa/X2bnzHl5/3YDqivWpSIr4Tdu3u/uPzGwG+AszO+/uX0tudOTIkZvLlUqFSqWS0+lF\nZCU00x+vXOlMf8wruA5yjlqtxuTkVhopm63tJya259qnYapWq1Sr1RU5du61dMzsMPBTd38i0a47\nfJERozv84SvUHL6Z3WJmG6Ll9cA7gW8u97giMnyrUbVykHPMzMxw7NinmZq6BvwCcBfT07+oSpp9\nWvYdvpm9EfgTwGlMEf0Xd//9lO10hy8yopSlMzyFS8vs60QK+CIiAyvih7YikiJ5Jwqk1oDv9uWk\nq1ev8r3vfY99+/axZcuWFb3THvQuu9cXpHq1Nccjeb60O/h+6udLD3l9ZbfXA5VWkMAcP/6UT03d\n6nCLw10+Obnep6c3ean0JoeSl0pvzSwB3CwbMDW1NaoYeY/DWl+z5tYVKwc8SLnhbiWOs8ogJ9uS\npZCb50uO2/T0ppvHT47duJZTiEPlkUWKbXFx0det2xwr99ss/XvSIas8sCdKCP9Jl9LB+ZYDHqTc\ncOe23a4hqy39es6dO5cYt/jxk2M3viWR4/IM+Pp2rMgKaOWLN8v91qLl9TSqQmaXAG6VEH6V7NLB\n+ZYDHqTccOe23a4hqy39ek6dOpUYt/jxk2OnksiDUsAXWQHlcpnr1xdolfstR8uv0Qh22SWAWyWE\nN5FdOjjfcsCDlBvu3LbbNWS1pV/Pvn37EuMWP35y7FQSeWB5/arQ64GmdCQwjbnoDdFc9J0+OXmL\nT09v8nXrytE89M9llgBuznWvWTMTTWfc7TDta9bcumLlgAcpN9ytxHFWGeRkW7IUcvscfmvcGnP4\njeMnx05z+EMoj9wPpWVKiJSloyyd5VIevhRW0euT1+t1Tp48ycLCAu94xzvYsmVLZmBJBrINGzZw\n8eJFXnnlFQA2b97c9qWftO3i26QdN+s/g4sXL7b1KXn8fgJer+2b509eD3CzffPmzezYsaPr+ZLX\n8fLLL/OVr3yFUqnEzp07U4856Jel4n1dyv6jLM+Arykdyc0gaX3DcPz4Uz4xUYpNkax1s1Jq+l8y\n3bCRDrg29mjtE5+qaN+u+3GzUzZ7Hb93WmKv7Vupj+19nZxc75OT62NtP9v1fMkUSrN1fRyzdV39\nvm6tvg6+/6hDaZlSNIOk9Q3D4uKir127MZHmuCkj/S8tHXBztH1yn3i6YXy7Xsdd5+kpm8mUxOTx\ne49x67VI376V+pi8nsVEW/fzdaaenouuq9sxB3t/tM6xtP3HQZ4BX9+0lVysRhnd5ajVaphtBtbR\nSOs7DWyjvcxuMr0wng64NXa09tK8sIWJifWJ7Xod9w3Allhf3hjbv9vxy6SlJcbHuPVapG/fSn1M\nXk9zTJpt3c/XWar4yei6Sl2OOVhJ49Y5lra/tFPAl1y0p+o1StYWKWWuXC7j/gpwjVZa4CUamcnN\nPsfTC+PPXwMWaNQHJLHPWeBlbtz4cWK7XsddBH5KZ8rmAo0/IJd1/Bq9xrj1WqRv30p9TF5Pc0ya\nbd3P10o9bfZ3X3Rda7scs3WsGzde7Pn+aJ0jOab97S8Jef2q0OuBpnTG3iBpfcPQmMNfF02v3OUw\nHZvDb0//S6YbNtIBpxPz03d2zLG3b9f9uNkpm9M9jt87LbHX9q3Ux/brmZy8xScnb4m1be16vmQK\npdnajuvvPOadS5jD37Dk/UcdSsuUolKWjrJ0lKWTL6VliogEQuWRJVOed9i9viDT7Q4yuS5+15q8\nY+x2nl53zVnXHL/r3LFjR8cdc69x6udLRfFt0+6Uk3fTvb5s1O28WWMcP29T2l152vkH+fJXc7/k\ndXY7Tyh34CMlr7mhXg80h7/i8syD71XGtlued3Ld1NROz8rrjuenJ8+TngOfnp+e7FsrN/xnO/ZL\nK+XbbRy7bZ+Vzx6fY+6nJHC382aNcft51w40xoOUaG7u18ql732eIn6GM6pQHr4k5ZkH336szjK2\n3fLCO9ed9FZueXL7k55d/re5rrl/t/z39rZWbnj83NmlfJO55b1K/za3b88RT88TP3fuXM+SwM1j\npZ83fYzbz9stdz5tjLttkza2iwOeZ3nvP2mXZ8DXlM6YyDMPvv1YzRzx9jK3rVzrco918dzybqWB\nk+dJy4HPyk9vP//16w7cSnpee2cp32Ruefs4Zm8PZOSzN7abmNjOqVOnuo5l/Fjp500f48Y1Jovd\n9jvG/ZVobryOyVz6fs7Tfm2a2ikOlUceE50la5eeB99+rDLJMratMred5Wo71zVzy3+Qsn08Pz15\nnmQO/KWOfjTy0ztL8EKd9rz2tOOmj1PnOGZv38oRv5Tavxs3XmTfvn1dxzJ+rPTzpo9x4xovJR79\njvEgJZrjufT9nqdzXKUg8vpVodcDTemsuDzz4HuVse2W551cNzW1w1u55e153fH89OR50nPg0/PT\nk31r5YZv9WRee1op327j2G37rHz2tDn8biWBu503a4zbzxs/f+8xHqREc3O/Vi597/NoDj8/KA9f\nsihLR1k6ytIZL8rDZzhf8BnWmzkZ9KDzCznJLw9l7Zf2D7RXYEyuT/6DT2trBvn4z2SwiQfw+Lmb\nyy+88AJXrlxh7969TE9Pd/S5Vz/iX2bqFVyh84tBzbbkuGYdYxjviX6uK3mN0Pk+CFnR/5MKvjzy\nMMrwDqv0b2fZ3c60xHh52uxyvelpdL3SFzvXJ9Py0lMCG6mYrZ+l0lsTKYFpx4kvT0VZH43jTU+/\nOdHnXv1olRzulQLZnnKYLOmbXY532O+Jfq6ref2ta+xd8jgkRS/p7Z7vlM7IBfxhlOEdVunf9jK3\n6WVz28vTtto79zvpnWl067x7+mJyfTItL60tmZKZTEmM75c8d3P5Vk9PSWz2qVc/BkmBTCvfm1Y6\nuVf65mq/J/q5rrTrSY5puOmTRS/p3ZRnwB+5tMw80w+LfM7282aXzW1UKry1o729XG9WGl2zRG9W\n+mJyPbF1WW3llJ9pJYCTx4kvX6FRxjirz6Ue/RgkBTKtfG9a6eT2crzDf0/0c13N649fT38llkMw\nrNdwmEYu4Lenr61OGd5hnLP9vNllcxte7mhvL9drdJa6jZfoTTt+2nqPzjfRpa2W8jOtBLAnjmOx\n5evA33fp89oe/UimQHa+Zu2pksnyvc227HK8w39P9HNdP6DzevorsRyCYb2GQ5XXrwq9HqzAHP5q\npn8Nq/RvZ9ndzrTEeHna7HK96Wl0vdIXO9cn0/LS2pqpmK2fpdLPJVICp1OOE19uzuFv8/Y5/EdT\n0iDT+tEqOdwrBbI95TBZ0je7HO+w3xP9XFfz+lvX2LvkcUiKXtLbPd8pnVyydMzsXcAf0riFmHP3\nx1O28TzO1aQsHWXpKEtHWTp5CClLZ9kB38wmgO8AvwT8kMaE4fvd/duJ7XIN+CIiIcgz4OdRWmEf\n8F13v+Du14CngAdyOK6IiOQoj4B/B40iGk0vRm0iIlIgKp4mIhKIPNIyXwJ2xp5vj9o6HDly5OZy\npVKhUqnkcHoRkfFRrVapVqsrcuw8PrSdBJ6n8aHtj4BTwEPufj6xnT60FREZUKH+pq27XzezQ8CX\naaVlnu+xm4iIrLKRrZYpIhKCoqVliojICFDAFxEJhAK+iEggFPBFRAKhgC8iEggFfBGRQCjgi4gE\nQgFfRCQQCvgiIoFQwBcRCYQCvohIIBTwRUQCoYAvIhIIBXwRkUAo4IuIBEIBX0QkEAr4IiKBUMAX\nEQmEAr6ISCAU8EVEAqGALyISCAV8EZFAKOCLiARCAV9EJBAK+CIigVDAFxEJhAK+iEggFPBFRAKx\nrIBvZofN7EUzeyZ6vCuvjomISL7yuMN/wt33Ro8v5nC8QqpWq8PuwrKMcv9Hue+g/g/bqPc/T3kE\nfMvhGIU36m+aUe7/KPcd1P9hG/X+5ymPgH/IzJ41s8+Y2aYcjiciIiugZ8A3s78ws7Oxx3PRz18D\njgJvcvc9wCXgiZXusIiILI25ez4HMpsFvuDuuzPW53MiEZHAuHsuU+drlrOzmW1z90vR0/cA38za\nNq8Oi4jI0iwr4AOfNLM9wA2gBnxo2T0SEZEVkduUjoiIFNuSs3TMbM7MFszsbKztqdiXsH5gZs/E\n1n3czL5rZufN7J2x9r3Rh8DfMbM/XPqlrFz/zWzWzP5vbN3Rgvb/H5jZX5rZGTM7ZWb/MLZuFMY/\ntf8jNP67zez/mNk3zOxpM9sQW1eY8R+k7wUd++1m9lUz+1aURPJo1H6bmX3ZzJ43sy/FswYLNv4D\n9T/X18Ddl/QA7gP2AGcz1v8B8LvR8puBMzSmkMrA92j9dvFXwD+Klv8MOLjUPq1g/2e7bFeY/gNf\nAt4ZLf9T4GS0/JZRGP8u/R+V8T8F3BctfwD4vSKO/4B9L+LYbwP2RMsbgOeBXcDjwL+P2j8G/H5B\nx3/Q/uf2Giz5Dt/dvwb8pMsm7wWOR8sPAE+5+9+7ew34LrDPzLYBt7r76Wi7zwLvXmqfBtFn/+dj\nzzs+dC5g/28AzbuazcBL0fKvMxrjn9V/GI3xvztqB/gK8GC0XKjxH7DvULyxv+Tuz0bLl4HzwHYa\ncebJaLMnY/0p2vgP2n/I6TVYkeJpZvZPgEvu/v2o6Q7gYmyTl6K2O4AXY+0vRm1DFev/38aay9Gv\nUyfN7L6orWj9/7fAH5jZC8AngY9H7aMy/ln9h9EY/2+Z2a9Hy++l8Y8YRmP8s/oOBR57MyvT+G3l\n68BWd1+ARlAF3hBtVtjx77P/kNNrsFLVMh+i/e541CT7/0Ngp7vvBX4bOB6fny2Qfw085u47aQTP\nY0Puz6Cy+v8jRmP8/yXwETM7DawHrg65P4PI6nthxz7qx+dovGcuA8kMlEJnpAzQ/9xeg+WmZXYw\ns0kaOfl7Y80vATtiz7dHbVntQ5PWf3e/RvQrsLs/Y2Z/C9xD8fr/sLs/BuDunzOzz0TtozL+yf7P\nRctXiQJQkcff3b8DHAQws7uBX4lWFX78s/pe1LE3szU0guUfu/vTUfOCmW1194VoumMxai/c+A/S\n/zxfg+Xe4Rudc0sHgPPu/sNY2+eB95vZtJm9EbgLOBX92vKqme0zMwP+BfA0q6ev/pvZFjObiJbf\nRKP/3y9g/18ys/ujfv4SjblKGJ3xT/b/O9HySIy/mc1EPyeA3wU+Ha0q4vj31fcCj/0x4Jy7fyrW\n9nkaHzgDPBzrTxHHv+/+5/oaLOOT5uM0pjr+H/AC8JtR+38G/lXK9h+n8en4eaJMjKj9bcBzNILT\np5bz6fdK9Z/Wt4ifAf4a+OUi9h/4x1H/zgB/Cdw7SuOf1f8RGv9HaWRcfBv4j0V9/w/S94KO/duB\n68Cz0XvlGeBdwO00PnB+HvgysLmg4z9Q//N8DfTFKxGRQOhPHIqIBEIBX0QkEAr4IiKBUMAXEQmE\nAr6ISCAU8EVEAqGALyISCAV8EZFA/H/OQknmKtr1SQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1165d46d8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "x = []\n",
    "y = []\n",
    "for date, count in maxforyear.items():\n",
    "    x.append(date)\n",
    "    y.append(count)\n",
    "\n",
    "plt.scatter(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### all vols in a year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.collections.PathCollection at 0x11eccc2e8>"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEACAYAAABRQBpkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X901fWd5/HnG5J7cyEJgW0UByRXBQS1DKEr207daayi\ndnqqrd2KzMzZttKOllKo09OKOj3QUlrRWsfuKUUsjnaPBDw9M63tsaJOye6ynRpWodiGKm0N/qiS\nu3XqFooS5L1/fL/f3G9ubiA3ueH+yOtxTg7ffO/3fu/7foHP+/v5+TV3R0RExrZxpQ5ARERKT8lA\nRESUDERERMlARERQMhAREZQMRESEApKBmSXN7Ekz221mz5jZ6nD/ZDN7zMyeNbPtZjYp9p6bzWy/\nme0zs8tG4wuIiMjIWSHzDMxsgrv/yczGA/8bWAF8GPi9u99uZjcBk919lZmdBzwIXAhMB54AZrkm\nNoiIlJ2Cmonc/U/hZhKoARy4Cngg3P8A8MFw+0pgq7sfc/duYD+wcKQBi4hI8RWUDMxsnJntBl4F\nHnf3XcDp7n4QwN1fBU4LD58GvBh7+8vhPhERKTOF1gyOu3srQbPPQjM7n6B20O+wYgUnIiKnRs1w\n3uTu/8/MOoArgINmdrq7HzSzqUBPeNjLwJmxt00P9/VjZkoeIiLD4O5WrHMVMprobdFIITNLAYuA\nfcDDwMfCwz4K/CDcfhi41swSZnYWMBPozHdud6/Yn9WrV5c8BsVf+jjGYvyVHHs1xF9shdQMzgAe\nMLNxBElkm7s/YmY/Ax4ys+uAA8A1YQHfZWYPAV1AL7DMR+MbiIjIiA05Gbj7M8CCPPtfAy4d5D1f\nA7427OhEROSU0AzkEWprayt1CCOi+EurkuOv5Nih8uMvtoImnY1KAGZqPRIRKZCZ4aXoQBYRkeql\nZCAiIkoGIiKiZCAiIigZiIgISgYiIoKSgYiIoGQgIiIoGYiICEoGIiKCkoGIiKBkICIiKBmIiAhK\nBiIigpKBiIigZCAiIigZiIgISgYiIoKSgYiIoGQgIiIoGYiICAUkAzObbmY/MbNfmtkzZvaZcP9q\nM3vJzJ4Of66IvedmM9tvZvvM7LLR+AIiIjJy5u5DO9BsKjDV3feYWT3wFHAVsBj4o7t/I+f4ucAW\n4EJgOvAEMMtzPtDMcneJiMhJmBnubsU635BrBu7+qrvvCbcPAfuAaVFced5yFbDV3Y+5ezewH1g4\nsnBFRGQ0DKvPwMzSwHzgyXDXcjPbY2bfMbNJ4b5pwIuxt71MNnmIiEgZqSn0DWET0feAle5+yMw2\nAF92dzezrwB3Ap8o5Jxr1qzp225ra6Otra3QsEREqlpHRwcdHR2jdv4h9xkAmFkN8CPgx+5+d57X\nW4Afuvs8M1sFuLuvD197FFjt7k/mvEd9BiIiBSpZn0HoPqArngjCjuXI1cAvwu2HgWvNLGFmZwEz\ngc6RBCsiIqNjyM1EZvZu4G+AZ8xsN+DALcBfm9l84DjQDVwP4O5dZvYQ0AX0AstUBRCRcpXJZOju\n7iadTtPc3FzqcE65gpqJRiUANROJSIm1t29j6dJlJBJpjh7tZvPmDSxZsrjUYZ1QsZuJlAxEZEzL\nZDK0tMzhyJEdwDxgL6nUxRw48KuyriGUus9ARKSq3HPPvRw5MoUgEQDMo7a2he7u7hJGdeqpZiAi\nY1Ymk2HGjNm88YYBHahmICIyBnV3d5NMng18G7gYWAC8i1tu+VxZJ4LRoJqBiIxZ/fsLzgAep67u\n07zwwnNlnwxUMxARKZLm5mY2b95AKnUxjY2Xk0p9hvvu21j2iWA0qGYgImNeJc4x0NBSERFRM5GI\niBSfkoGIiCgZiIiIkoGIiKBkICIiKBmIiAhKBiIigpKBiIigZCAiIigZiIj0k8lk2LVrF5lMptSh\nnFJKBiIiofb2bbS0zGHRohtoaZlDe/u2Uod0ymhtIhERKu/xl1qbSERkFHR3d5NIpBmrj79UMhAR\nAdLpNEePdgN7wz176e09QDqdLl1Qp5CSgYgIuQ+6WUAqdTGbN28oyyai0TDkPgMzmw58FzgdOA7c\n6+7fNLPJwDagBegGrnH318P33AxcBxwDVrr7Y3nOqz4DESkblfKgm5I93MbMpgJT3X2PmdUDTwFX\nAR8Hfu/ut5vZTcBkd19lZucBDwIXAtOBJ4BZuSW/koGISOFK1oHs7q+6+55w+xCwj6CQvwp4IDzs\nAeCD4faVwFZ3P+bu3cB+YGGR4hYRkSIaVp+BmaWB+cDPgNPd/SAECQM4LTxsGvBi7G0vh/tERMra\nWJx4VlPoG8Imou8R9AEcMrPcNp6C23zWrFnTt93W1kZbW1uhpxARKYr29m0sXbqMRCIYXbR58waW\nLFlc6rDo6Oigo6Nj1M5f0KQzM6sBfgT82N3vDvftA9rc/WDYr7DD3eea2SrA3X19eNyjwGp3fzLn\nnOozEJGyUEkTz0o96ew+oCtKBKGHgY+F2x8FfhDbf62ZJczsLGAm0DmCWEVERtVYnng25GYiM3s3\n8DfAM2a2m6A56BZgPfCQmV0HHACuAXD3LjN7COgCeoFlqgKISDnrP/EsqBmMlYlnWptIRCSUyWS4\n5557+epX76S2toXe3gNl02eQq9jNRAV3IIuIVKN4x7H7cT7/+f/C9dd/suz6CkaLagYiMuZVUsdx\npNQdyCIiVWcsdxxHlAxEZMwb6yuWgpKBiMiYX7EU1GcgItKnUlYshRKuWjpalAxE5FSopIJ+KNSB\nLCJSoLH8oPuhUs1ARKpaJQ4bHQrVDERECrB7927GjTuTsTxsdCiUDESkarW3b+OqqxZz+PB+xvKw\n0aHQchQiUpUymQxLly7jjTf+B8GDGduAKaRS/z7mho0OhWoGIlKV+s8qfi+wlVTK+f7328ty4blS\nU81ARKpSdlbx7QQr7Z/JkSOv8PzzB0obWJnSaCIRqVr33HMvN9ywkuBx7dUzkgg0mkhEZMgWLJhP\nQ8McNJLo5JQMRKRqpdNpjh07gEYSnZySgYhULS1AN3TqMxCRqldt6xKBFqoTERm2TCbD7t27AWht\nba3oxKAOZBGRYWhv38a0aedw+eUf4vLLP8306bO0YF2MagYiUvUymQwzZszmjTcM6KAahpmqZiAi\nUqDu7m7Gjz8dOIv4MNNx46ZrmGloyMnAzDab2UEz2xvbt9rMXjKzp8OfK2Kv3Wxm+81sn5ldVuzA\nRUSGKp1O89ZbB4HniQ8zPX78JQ0zDRVSM/gn4PI8+7/h7gvCn0cBzGwucA0wF3gfsMHMiladEREp\nRHNzM/fdt5Ha2l7gXcBMEom/1DDTmCGvTeTuO82sJc9L+Qr5q4Ct7n4M6Daz/cBC4MnhhSkiMjJL\nlizm0kvfWzWjiYqtGH0Gy81sj5l9x8wmhfumAS/Gjnk53CciUjLNzc1cdtlltLa20t3dTSaTKXVI\nZWOkq5ZuAL7s7m5mXwHuBD5R6EnWrFnTt93W1kZbW9sIwxIRyYpPOnviiZ+wdOkyEolgVdO77rqN\nBQvml/2EtI6ODjo6Okbt/AUNLQ2biX7o7vNO9JqZrQLc3deHrz0KrHb3Ac1EGloqIqOpvX1bX+H/\n5pu/5fhx5+jR/wmcAawFvkNDwxyOHTvA5s0bKuZZB6UeWmrE+gjMbGrstauBX4TbDwPXmlnCzM4C\nZgKdIwlURKRQ0dPOjhzZweuvP8Ubb3yLo0ebCZ58NhvYDPyMP/7xaY4c2cHSpcv6mo4ymQy7du0a\nM01JhQwt3QL8FJhtZi+Y2ceB281sr5ntAd4D3Ajg7l3AQ0AX8AiwTLf/InKq9X/aGcAigi7MTwHf\nAvIvb93evo2WljksWnQDLS1zxsRMZc1AFpGqlclkaGmZw5EjO4hmHY8f/07eemsawb3tHCD7Wip1\nMU89tZN3vOOifu8px5nKpW4mEhGpGPmWsP7Wt+4mlXoNeIVgDEwbMLNveetDhw7l1CbGxgNx9Axk\nEalq0fyC+BLWjY2NLF16MbW1LRw96tx663Vcf/0naW5uJpPJhM9O3ktUMxgLD8RRM5GIjEknesZB\nNAKptraF3t7yHGWk5xmIiJwC5f5AHCUDERFRB7KIyFCNtbkCI6FkICJVaSzOFRgJNROJSNXJN7+g\nHOcKjISaiURETmLgzOOxMVdgJJQMRKTqpNPp2FwBGCtzBUZCyUBEqk6+mcd6qtmJqc9ARKpWuc8V\nGAnNMxAREXUgi4hI8SkZiIiIkoGIiCgZiIgISgYiIoKSgYhUKS1SVxglAxGpOlqkrnCaZyAiVWUs\nLFIHmmcgInJCWqRueJQMRKSqaJG64RlyMjCzzWZ20Mz2xvZNNrPHzOxZM9tuZpNir91sZvvNbJ+Z\nXVbswEVE8tEidcMz5D4DM7sIOAR8193nhfvWA79399vN7CZgsruvMrPzgAeBC4HpwBPArHydA+oz\nEJHRUM2L1EGJF6ozsxbgh7Fk8CvgPe5+0MymAh3uPsfMVgHu7uvD434MrHH3J/OcU8lARKRA5daB\nfJq7HwRw91eB08L904AXY8e9HO4TEZEyVFPk8w3rFn/NmjV9221tbbS1tRUpHBGRwVVSU1JHRwcd\nHR2jdv6RNhPtA9pizUQ73H1unmaiR4HVaiYSkXLR3r6NpUuXkUgEo482b97AkiWLSx3WkJW6zyBN\nkAzeHv6+HnjN3dcP0oH8nwiahx5HHcgiUiaqYWJayfoMzGwL8FNgtpm9YGYfB24DFpnZs8Al4e+4\nexfwENAFPAIsU4kvIuVCE9MG0nIUIjLmqGYwkGYgi8iYo4lpA6lmICJjViWNJspV0g7k0aBkICJS\nODUTiYhI0SkZiIiIkoGIiCgZiIgISgYiUoUymQy7du0ik8mUOpSKoWQgIlWlvX0bLS1zWLToBlpa\n5tDevq3UIVUEDS0VkapRDTOLh0pDS0VEBqE1h4ZPyUBEqkY6HSxHDdGj2vfS23uAdDpduqAqhJKB\niFQNrTk0fOozEJGqU8lrDg2V1iYSERF1IIuISPEpGYiIiJKBiIgoGYiICEoGIiKCkoGIiKBkICJV\nRiuWDo+SgYiUvaEW8FqxdPiKMunMzLqB14HjQK+7LzSzycA2oAXoBq5x99fzvFeTzkRkUO3t21i6\ndBmJRLDu0ObNG1iyZPGA48bSiqVQvpPOjgNt7t7q7gvDfauAJ9z9XOAnwM1F+iwRGSMymQxLly7j\nyJEdvP76Uxw5soOlS5flrSFoxdKRKVYysDznugp4INx+APhgkT5LRMaIQgp4rVg6MsVKBg48bma7\nzOwT4b7T3f0ggLu/CpxWpM8SkSoW7x8opIDXiqUjU1Ok87zb3V8xs2bgMTN7liBBxA3aMbBmzZq+\n7ba2Ntra2ooUlohUknj/wJtv/pZbb/08d911GzfeeDG1tS309h44YQG/ZMliLr30vVW5YmlHRwcd\nHR2jdv6ir1pqZquBQ8AnCPoRDprZVGCHu8/Nc7w6kEUkpwN4H/ApoIlk8vd85SurmTfvAgBaW1sB\n8hb4Y2Hp6kjZLWFtZhOAce5+yMwmAo8BXwIuAV5z9/VmdhMw2d1X5Xm/koGIsGvXLhYtuoHXX38U\nmAPcBKwHzgR+RW1tkgkTZvKnP+3HbByp1Dn9RhcNddRRtSjHZHAW8C8EzUA1wIPufpuZTQEeIvib\nPEAwtPQPed6vZCAisZrBfwNuA14GdgBnAOcCHTnb2eGjTz21k3e846IxM6wUip8MRtxn4O7PA/Pz\n7H8NuHSk5xeRsSHqAL7uuht4440jBLWDecAu4Kw82xCNLurs7CSRSHPkyMBRR9WaDIpNM5BFpGws\nWbKYF154jlWrPgc8RzCKKA08n2cbotFFCxcu1LDSESrWaCIRkaJobm7ma19bRzqdZuXK95BIpHnj\njV7M/pK6urM5ciTYTiTOpLf3Be666+vs2bOXY8eOAu8CziCR+L9s3nyPagUF0DOQRaRsxUcHQXYE\n0T//8/dZufIL1NRMp7e3G/dx9Pb+L4I+hcepq/s0L7zwXFUng7LrQB5xAEoGIlKAbEdzNNroPxCs\niPPrvmMaGxfwxBP3cOGFF5YoytFXdh3IIiIjMdjdf3RXn/v6I488wrhxf0aQCHYAtQTNQ3uJRhKp\nv6BwSgYiJTKWJkhB/u8bnxtw+PCzmI2nru4cenuf59ZbP09zczM33riKRCLdN78gmZzB4cP7CUYb\n7QOWAZOAd5JIpBk//qCWoRgOdy/pTxCCyNiyZctWT6WmeEPD2z2ZbPSNGze5u3tPT493dnZ6T09P\niSMsjuj7bNy4yVOpKT5p0gJPpab4li1bvaenx1OpKQ4/d9jkkAq3tzpMdjgrtq8n3BdtL3WoC/et\nd5jicL5D0u+4485Sf+1TIiw7i1cWF/NkwwpAyUDGmGwhGBVif+6Q8uuu+7sBBWbu++KJolwSx2Bx\nxBNetlB3h597KjXFV626xWFmWLg3htehy6EpPLYz3Ofh9oIwUUwJt2vChDGl37mTyaaSX5NTQclA\npMJ1dnaGBWS8ENuRt8CMCrWoYI0SxfLlK/PeaZ/q5JAbV5TA+t/1rwsLfe/7qa+/wBOJxvDOfnlY\nqDeEP7PD43pi16jHYVKsdhBds2QsYQQ/DQ3zvbOz85Rdg1JRMhApcycrlHt6ejyZbMwpxDpzCsFO\nr6+/oO882YI1f+KoqZnoyWSTT5x4Qb9mp9H+nv3jyiawzs5OnzRpQfhdmnIK8Z97MtnoDQ2tYfNQ\nXVjQN4bfLX7O9Q4pb2iY7+PHpwYklWTyzBMm0WpW7GSgGcgiRTSUZ/A+8cRPeOutt8jOsAU4DLwI\n3E7QMfpJDh36DU8/vSfPA14mEiz5Ff3+JMeOHefNN2/m8OHf8eabZ3HDDSu55557R++LArt372bc\nuHgc2SUgss8heBw4G/g2cDGwAHgnixdfzbFjB8LvMhf4AsEjT9qADeGxs0kmv8bGjXfzr/+6iWee\neYpU6jXis4zHjTvMHXd8hWTyPTQ0tOoZBiNRzMwynB9UM5AqcaI75fzHbArviNOeTDb6kiV/O+Au\nt66uybdt23aCmkGXw0QPOk9Ht+08uuPv6urytWvXeV1dk8OEWDPOtz2RqPedO3f62rXrvLa2Powz\nfsxnwjv9Vq+trffa2obw9dwawQ5PJhu9q6urXwxRs1RjY+uAZqly6D85lVAzkUh52r59u0+c2L/9\nurGxtV/7dbb5JDpmk8MEnzBhnieTjV5Xd0GsqWhd+Nr5Pn58ymtrG72xsdXr6pr8Ix9ZHBakybC9\nPbfZafht5/kK1o0bN3ky2eTJ5Flhs05UwG91qA/jmOBwRs7rm8LtCQ7pnGS3wxOJel+xIuj/qKsL\nXk+lLsjbgX6i+MYiJQORMrRly9acO+VszaCrq6uv8OpfM4gPl4zf8a8P29kneP8RR3V++eXv6+sb\nCI6N7qi/WJS283wdwhs3bsr5rOWxtvuuMBFNjr3+oAejfXI7gL8ce180Kmi2J5NNvnHjpr5ahwr6\noVEyECkz/Qv4aIz8Of1G/cTnE0R32RMmzM4pVO/32tqpYcEbjbCJN53EC/ztDm/PKVhPc0h5Xd35\nJ7yzHkxXV5cnk02ee+eeSDSEyagz/MyoQ3h9mAhmhIV/NPwzSgJRUohqKz2ev0lo7HT6FpOSgUiZ\niJortm/fntP00+MTJ86OtfXHJ0XVem1tozc0vN0TiXqvrW10+OuwkJ/tkPSamplhgRsVwlFBGjUF\nbY3VHAa215/szjpfM8uWLVvDEU7RiKYowcyIJaUdsRjitYUoOcQL+a0e9If0rynV1Ez0cePqHM7x\neJNWbnOanJySgUgJ5ZtRm0w2hoV6VDA/6HV1Tb59+/bYfIJ400+2cBw/foLntqNnx87HZ+Vud5jr\n/YdpRu31ExxmeiIx6aTt7CeeCRwV5vFCPRoauj5MTokwpgcdWmOJI4rjdA9qJ3M8kaj3K6/8UF+H\nb21tQ9jPkfLcoaaqGRROyUCkSArtiMw/ozZqFjrNs52oQcEcNAc1enakT7ytPfipq5sRuxuPfk6L\nnX+TB6OFolE58XPE2+QHFqo9PT2+fft2v+mmmweZCRw0A61bty4c8+/h59U7zIr9Hv/8KKYG75/Y\ngnPde++9vmLFZz2ZbPKGhmDEz8aNm3z79u1hwomajqKaR6vDBF+7dt1o/lVXJSUDkSKI2u3jBVZu\nYogK0+3bt3tXV1fs7vl+h3mxwjj/XX/QJDIhTBLn+2CTr/qvv/OgQ52vXv2lnI7ifE0v8eUagp+o\nuWXjxk0+fnxuId4Zxh3dzUcFerQGUNSc9Wfh74P1UbhDl9fUvM0TiUl9wzyXL185aCd6tiktt1M5\nqEWpVlA4JQMpK6dymN/J1uYZSiw9PT3hmjjxO+RolmswbHPVqlt8xYrPhoVpcKdfUzPRa2tnhAVZ\ndIf9oMMFnr+z9M7YZ9zp2aafgXfEy5ev8Gyt4hyvqWnoa765//77Y3ftQW0gkTjDk8kmr6+PJ4ps\nwXvHHXd6dhG3eE0iimlHTmKKmqNyRyfFl3qIF+JRbWim19U1+dq162LJMvc6BAkqWzMY2MleaEe3\nBJQMpGwMti7NUA1WmOcbXniitXnq6pr8Ax/4UL87/XyxZId/DlbIRU0iiVhhGhW0/5IngSRi58o9\nT+6aOfHk0P+OuKenx+vq8rehDzaRLbpGUR9AdHeebZqa5f37GKK7/NwRQFEiiPoAohFBnbFEF//O\n8TkEJ7vz7/9d4hPGoiSiGsHwKRlIyUXNJ4PNts13hx5vcokXDFFh/pGPXOup1BRPpc72YOLR2/Pc\ndeabgbvVs23aA9uvo8/LDpvMHbIZNX/E18hpCAvT+B1up2fb0XsctoWF4hd9YDNO1CzU//qMH58K\nE9b8fglr4ES0/qNrBpt1G7+20fVeu3adZyehxTui48kpmhsQfdd4MtuR82fUdDTPIeUrVqzMO7Fu\nqHf+mjBWPEoGUjTD+Y8ZFUwTJ57r/TtDg+GUUWdlVMivXbvON27cFFt2IGpyiUbf3BkW5rnjz7NN\nEcFY9/NyCubZnh3pEh+CGbWFR80uZ7lZMvw97f3HyEcjW1KxBHCu9x9Ln29CWHT+eCdrk8Msr6mZ\n6InEuXkL0o0bNw2aKIeyjMVQmsCCmk/UPBR1RMebs+JzFmrCa5J7zYMRQbW1wSJwdXVz+uZInChW\n3fmfWhWXDIArgF8RrMp1U57Xi3qBSnHnUcq7nXjTSvzOO3ot3gEanwW7du26AZOh8jXbRO/fuXOn\nf/Ob34w1Z0R3l/E1dmZ4/1Ew0b54k0vUFDPL4ZM+sImi1Qc2ucSHX/Y43O7ZiVnTw8I7uptt8uxS\nx+vDz4rf8ecbpx/dTee+NxoumQ4L/7rYd4knriBZJJONvnPnzpy5Bef5UB64crK7/6HI1jDi3++L\n4d9TfDZzi0PKE4mZ3r/jeJ5Dna9adUvfv5cTNdnli7Xa7vzL+ftUVDIAxhE8pbqF4EGle4A5OccU\n7eKMtA27Uj4z97ODppX+wxqXL18Zuxs/w6Oml+xY79ylDpJeW9vYr00+e9zksNCY4cFkofjddyJW\noEwIX9+Us2+a92+fjjonc8fTR00uUadn1HQTv5NPxj43KuQn53zWOZ69u483+eSO249Gx8RHBcXX\n2UmHtYo6zyat2Z5tPoo6g2d5MtnU93cf/b3U1xe2nPRIC56BS10sd0j5xIkX9K1tNHHiHB/Y95Eq\neOnrci4ki6WU/7eHotKSwTuBH8d+X5VbOyhWMhhKVbvYSvGZAz87uhvObdLIdwcbrYXzoGfvpgdb\nIyd6f7zjNHrAyKTYuaNx9E2x1xry7Ivvn+LwEc/ejcc7OL/o+dvue7z/nXtUm4gmPMWbe+pi781t\n8smuFDpYoRgtnrZt27aw2SXerj7wWudbXbNUhWXuXXt8yGz+0UnBg2buv//+qi7YC1XK/9tDVWnJ\n4MPAptjvfwt8M+eYolyYk3XCjYZSfObAz+70oJ07t7Mzd70Y9/5rx8RXuew8wfvv9/6TotZ59k4/\nWqsmGr2ywLNNLvF9W8OCN+39H28YrwVETUTRn/Gmm6jwPzf2XXPHq98Unmedw1TPNg/lNvmc4zU1\n9X19GYMVnNE1DvpG4okz6rAOznWiWb+lcqJEVAmFXDko5f/toSp2MqgZ/pMQimfNmjV9221tbbS1\ntRV8juzDNPYSPGxjL729B0in00WJsVw+c+BnHwYOAhaL4zDQA/wx3I5iTAPPA68AXwdW5uzP9/5J\nBA9diV77C2At8KfwuJfD89wBpMLX78zZNxd4iqCiuDL87F7gk8AmgtbEw8CB2J/ReW4OP+8T4TdP\nhMe/QvAQlDZgCnV1v+f4cefo0eh9PwIeA/4R+BJwGuPH9/D5zy/m7//+s30PP7n66g/2PYwl94Eo\n6XSat96Kru0qggeuTAfeYsWK63n/+/+K1tbWsnuQSnNz86AxNTc3s3nzBpYuvZja2hZ6ew/oYTB5\nlPL/9mA6Ojro6OgYvQ8oZmbJ/SH43/9o7PdRayZyL04nXCV8Zu5nB+vAJ/rdrS5fviJ8uEh2vZhU\n6gKvra3vmzUajeppaJjfb3/QZxB//6TwLn6WQ8ovu+x9A85dUzMtFkO+fef4+PETPJGY1Peowrq6\n8z2ZbPRLLlnkdXWT+9azj0axxEezROvcBJ+bnaBVW9vQN2ql//UIvm80iSzesV7oNc5+17TX1Ew8\nJY+UHG1joc1/pEr5f3soKHLNwIJzjg4zGw88C1xCcCvXCSxx932xY7yYMWQymUHv9EZLKT4z97Pr\n6+t58cUXAfruVjOZDLt37wbgzDPP5NChQ313NlG8g23nvn/ixIn8+te/ZuHChcydOzfvueMx5NvX\n2tra9xn19fV98USfFd+f+2duTH/4wx9oamoacGeee55i/J3Ev2s51gRk9JTy//bJmBnubkU732gm\nAwAzuwK4m6Buv9ndb8t5vajJQERkLKi4ZHDSAJQMREQKVuxkMK5YJxIRkcqlZCAiIkoGIiKiZCAi\nIigZiIgISgYiIoKSgYiIoGQgIiIoGYiICEoGIiKCkoGIiKBkICIiKBmIiAhKBiIigpKBiIigZCAi\nIigZiIhS8t2zAAAFPUlEQVQISgYiIoKSgYiIoGQgIiIoGYiICEoGIiLCCJOBma02s5fM7Onw54rY\nazeb2X4z22dml408VBERGS3FqBl8w90XhD+PApjZXOAaYC7wPmCDmVkRPqvsdHR0lDqEEVH8pVXJ\n8Vdy7FD58RdbMZJBvkL+KmCrux9z925gP7CwCJ9Vdir9H5TiL61Kjr+SY4fKj7/YipEMlpvZHjP7\njplNCvdNA16MHfNyuE9ERMrQSZOBmT1uZntjP8+Ef34A2ACc7e7zgVeBO0c7YBERKT5z9+KcyKwF\n+KG7zzOzVYC7+/rwtUeB1e7+ZJ73FScAEZExxt2L1hdbM5I3m9lUd381/PVq4Bfh9sPAg2Z2F0Hz\n0EygM985ivllRERkeEaUDIDbzWw+cBzoBq4HcPcuM3sI6AJ6gWVerCqIiIgUXdGaiUREpHKNygxk\nM9tsZgfNbG9s39bY5LTnzezp2Gt5J6iZ2YKws/o5M/vH0Yh1pPGbWYuZ/Sn22oZSxj9I7H9uZv9m\nZrvNrNPM/mPstUq49nnjL7drf4L455nZT83s52b2AzOrj71WCdc/b/zldv3NbLqZ/cTMfhkOdFkR\n7p9sZo+Z2bNmtj026rGsrn+h8Rf9+rt70X+Ai4D5wN5BXv868A/h9lxgN0GTVRr4Ndkay5PAheH2\nI8DloxHvCONvOcFxpzz+fLED24HLwu33ATvC7fMq4dqfIP6yuvYniL8TuCjc/hjw5Qq7/oPFX1bX\nH5gKzA+364FngTnAeuAL4f6bgNvK8foPI/6iXv9RqRm4+07g309wyDXAlnA77wQ1M5sKNLj7rvC4\n7wIfHI14cw0x/vbY7wM6wUsV/yCxHweiu6EmgnkfAFdSGdd+sPihjK49DBr/rHA/wBPAh8PtSrn+\ng8UPZXT93f1Vd98Tbh8C9gHTCcqYB8LDHojFUlbXfxjxQxGv/ylfqM7M/jPwqrv/Ntw12AS1acBL\nsf0vUQYT12Lx/ya2Ox1W03aY2UXhvnKK/0bg62b2AnA7cHO4v1Ku/WDxQ/lfe4BfmtmV4fY1BP/B\noXKu/2DxQ5lefzNLE9Rwfgac7u4HIShwgdPCw8r2+g8xfiji9S/FqqVL6H9XXWly4/8dMMPdFwCf\nA7bE24TLxKeAle4+g6Bgva/E8RRqsPhfofyvPcB1wKfNbBcwETha4ngKNVj8ZXn9wxi+R/Bv5hCQ\nO0qmrEfNFBB/Ua//SIeWFsTMxhPMR1gQ2/0ycGbs9+nhvsH2l0y++N29l7Ba7e5Pm9lvgNmUV/wf\ndfeVAO7+PTP7Tri/Uq59bvybw+2jhAVTGV973P054HIAM5sFvD98qSKu/2Dxl+P1N7MagoL0v7v7\nD8LdB83sdHc/GDah9IT7y+76FxJ/sa//aNYMjIHtWYuAfe7+u9i+h4FrzSxhZmcRTlALq0Ovm9lC\nMzPgvwI/4NQZUvxm9jYzGxdun00Q/29LHH9u7C+b2XvCGC8haBuFyrn2ufE/F26X47UfEL+ZNYd/\njgP+AdgYvlQR13+w+Mv0+t8HdLn73bF9DxN0fAN8NBZLOV7/Icdf9Os/Sr3iWwiaT94EXgA+Hu7/\nJ+Dv8hx/M0FP/j7CUSPh/ncAzxAUXnePRqwjjZ/szOungf8D/FUp488XO/AXYWy7gX8DWivp2g8W\nf7ld+xPEv4JgZMivgK9W2r/9weIvt+sPvBt4C9gT/lt5GrgCmELQ8f0s8BjQVI7Xv9D4i339NelM\nRET02EsREVEyEBERlAxERAQlAxERQclARERQMhAREZQMREQEJQMREQH+P/9RyLYhg6RAAAAAAElF\nTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x12911ac50>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "x = []\n",
    "y = []\n",
    "for date, count in datedist.items():\n",
    "    x.append(date)\n",
    "    y.append(count)\n",
    "\n",
    "plt.scatter(x, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2009\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAEACAYAAABRQBpkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2UXXV97/H3d55P5iEPZQggMMNDIVSMZNp4tfWWCSao\ndF311iuYdbuuQrSgRaiXSkKoEi8LF0FbJbYYaINil3mwPt+70oZmMccu762dKRml7STRSidi1cy0\nQBaRkITwvX+cfWb2OWefM+dhn5l9Zj6vtWZxzj5n7/3bG9jf83v6/szdERGRha1prgsgIiJzT8FA\nREQUDERERMFARERQMBARERQMRESEGIKBmV1mZqNmdiD45zEzu83MlprZ42Z22Mz2mdniOAosIiLx\nszjnGZhZE/AT4D8BtwL/4e4PmNlGYKm7b4rtZCIiEpu4m4nWAj9y92eAtwOPBdsfA94R87lERCQm\ncQeDG4Cdwevl7n4UwN1/Dpwd87lERCQmsQUDM2sF3gb8ZbApv/1JeS9ERBKqJcZjvRV40t3/PXh/\n1MyWu/tRMzsHmIjaycwUJEREquDuFtex4mwmWg/sCr3/FvDe4PV7gG8W29HdG/bvnnvumfMyqPxz\nX46FWP5GLvt8KH/cYgkGZraITOfx10KbtwLrzOww8Cbg/jjOJSIi8YulmcjdXwR687Y9SyZAiIhI\nwmkGco0GBwfnugg1UfnnViOXv5HLDo1f/rjFOumsqgKY+VyXQUSk0ZgZntAOZBERaVAKBiIiomAg\nIiIKBiIigoKBiIigYCAiIigYiIgICgYiIoKCgYiIoGAgIiIoGIjIPDc5OcnIyAiTk5NzXZREUzAQ\nkXlr16499PWtYN26W+jrW8GuXXvmukiJpUR1IjIvTU5O0te3ghMnhoCVwFOkUms4cuQQvb29M+2e\neEpUJyJShvHxcdra+skEAoCVtLb2MT4+PneFSjAFAxGZl/r7+zl1ahx4KtjyFKdPH6G/v3/uCpVg\nCgYiMi/19vayY8dDpFJr6OkZIJVaw44dD82LJqJ6UJ+BiMxrk5OTjI+P09/fP68CQdx9BrEEAzNb\nDPw5cCXwCnAT8ANgD9AHjAPXu/uxiH0VDEREKpTUDuQHgb3ufgXwWuAQsAnY7+6XA08Ad8V0LhER\niVnNNQMz6wFG3f2SvO2HgKvd/aiZnQOk3X1FxP6qGYiIVCiJNYOLgH83s8+b2QEze8TMFgHL3f0o\ngLv/HDg7hnOJiEgdxBEMWoAB4E/dfQD4BZkmovyf+/r5LyKSUC0xHOMnwDPu/g/B+6+SCQZHzWx5\nqJlootgBtmzZMvV6cHCQwcHBGIolIjJ/pNNp0ul03Y4f12iibwPvd/cfmNk9wKLgo2fdfauZbQSW\nuvumiH3VZyAiUqGkDi19LZmhpa3A08CNQDPwZeAC4AiZoaXPR+yrYCAiUqFEBoOaCqBgICJSsSSO\nJhIRkQanYCAiIgoGIiKiYCAiIigYiIgICgYiIoKCgYiIoGAgIiIoGIiICAoGIiKCgoGIiKBgICIi\nKBiIiAgKBiIigoKBiIigYCAiIigYiIgICgYiIoKCgYiIAC1xHMTMxoFjwCvAaXd/nZktBfYAfcA4\ncL27H4vjfCIiEq+4agavAIPuvsrdXxds2wTsd/fLgSeAu2I6l4iIxCyuYGARx3o78Fjw+jHgHTGd\nS0REYhZXMHDgb8xsxMzeF2xb7u5HAdz958DZMZ1LRERiFkufAfAb7v4zM+sFHjezw2QCRFj++ylb\ntmyZej04OMjg4GBMxRIRqczk5CTj4+P09/fT29s718WZkk6nSafTdTu+uRd9Rld3QLN7gOPA+8j0\nIxw1s3OAIXe/IuL7HncZRESqsWvXHjZs+CBtbf2cOjXOjh0PsX79DXNdrEhmhrtbbMer9UFsZouA\nJnc/bmadwOPAx4E3Ac+6+1Yz2wgsdfdNEfsrGIjInJucnKSvbwUnTgwBK4GnSKXWcOTIoUTVELLi\nDgZxNBMtB75uZh4c70vu/riZ/QPwZTO7CTgCXB/DuURE6mJ8fJy2tn5OnFgZbFlJa2sf4+PjiQwG\ncas5GLj7vwJXRWx/Flhb6/FFRGZDf3+maQieIlszOH36CP39/XNartmiGcgiIkBvby87djxEKrWG\nnp4BUqk17Njx0IKoFUAdOpArLoD6DEQkQZI6mihf4jqQay6AgoGISMXiDgZqJhIREQUDEVkYJicn\nGRkZYXJycq6LkkgKBiIy7+3atYe+vhWsW3cLfX0r2LVrz1wXKXHUZyAi81qjTSYrl/oMREQq8PDD\nf8aJE8vIBAIITyaTaaoZiMi8NTk5yYUXXsZLLxmQRjWD4lQzEJF5a3x8nPb2i4HPAWuAAeANbN58\nB4A6lEMUDERk3ppOMXEFcAj4Azo62ujt7VWHch41E4nIvJZNS93a2sfp00f49Kfv58Mf3tTwHcpq\nJhIRqcD69Tdw5Mgh9u9/mCef/A4dHW20tPShDuVcca10JiKSWL29vezf/wQbNnyQlpZX8cIL/8JC\nzU5ajIKBiMx7k5OTbNjwwVDT0APA6+nuvpyXX/7xgspOWoyCgYjMe4UL19xJV9df8NnP/j7XXXfd\ngg8EoD4DEVkAcheuAXiKM2d+qkAQomAgIvPeQl+4phwaWioiC0a5C9c0wgI3iR1aamZNZnbAzL4V\nvF9qZo+b2WEz22dmi+M6l4hINXp7e1m9enXJB/xCzXAaW83AzD4M/CrQ4+5vM7OtwH+4+wNmthFY\n6u6bIvZTzUBEEqGRMpwmsmZgZucD1wF/Htr8duCx4PVjwDviOJeISL1kRx0txAlpcTUTfRr4CBD+\nib/c3Y8CuPvPgbNjOpeISF1EjTpaKBPSap5nYGa/BRx19++Z2WCJrxZtC9qyZcvU68HBQQYHSx1G\nRKS4Wjt/N2++g098Ys1ULqOkjDpKp9Ok0+m6Hb/mPgMz+wTwO8DLQAroBr4O/Bow6O5HzewcYMjd\nr4jYX30GIhKLbFK6trbML/wdOx5i/fobKt735Mmnufvuj3Dzze9PRCCIEnefQaxDS83sauCOoAP5\nATIdyFvVgSwi9VZL528jdRxnJbIDuYj7gXVmdhh4U/BeRKRik5OTMy5EU0nnb/7xxsfHF3wm01iD\ngbt/293fFrx+1t3Xuvvl7n6tuz8f57lEZGEod9x/uZ2/Ucc7cOB7vPDCoRn3ndfcfU7/MkUQESk0\nMTHhqdQyh+87uMP3PZVa5hMTE5Hf37lzt6dSy7ynZ5WnUst8587dMx6vo2NJsG2rwzKHlQ4p3779\nkdm4xKoFz87YnsXKWioiiTQ5OcnevXuLNt9EteWvX38Da9dew+joKM8//zxLlizJaQp67rnn8rKX\nrqS5+WwyY1/uBG4ExunquomBgavqfo1JomAgIomTHdlTzUI0+/c/wXve835Onz4DnEdz889obm4h\nlbqEkyef5pVXPOd4Z85MYNYU2vYzzpz56cJqIkLBQEQS5uDBg9x44y2cPPltKl2IZnJykptuuoXT\np1uA7wCtnDnzBs6c+VtOnco8/Ftafp329qtpa+sPjrcdgA0bkje3YDYpGIhIYuzatYcbb/xdTp48\nh+mmofIXohkfH6e5eTnQCRwEfhcIH+sgL7/stLW9ilOnnubBBz81NQ9h7dprEp+ptJ6UwlpEEmF6\nrP9XgXcC1c0XuPDCy3jpJSczWPJroWOdC1wOpCs+bhI10jwDEZGyTc8TGAQeAtYAl9HefnXZzTa9\nvb08+uh2mptPAcvyjvWrwC+xkOcSlKKagYgkQuEs4DTt7W9ndPS7XHFFQSabkg4ePMiqVb8e6ndI\n09b2X2hqauWll9KoZlBIfQYikgjZpSlzO3IfqTgQAFxxxRV8/vPb846VybC/0DuKi1HNQEQSpVjW\n0fB2oOrlK8vdlnSJTlRXVQEUDERkBuGMoi+++EPMmkilLqk4M+lMx47jeLNFwUBEFpTcvoR4RwQ1\nYrbSLI0mEpEFJTcb6ThwEcVGBJWT3bT4sQuPt5AoGIhIouVmI+0H/pWo7KLlZjctfuzc4y00CgYi\nkmjZUUap1Bp6et5Ma+tp2tp+k56eAVKpNezY8RAAGzZ8kBMnhjh27ElOnBhiw4YPzlhDyD329PGS\n3kRUD+ozEJHEKjaCKPy6t7eXkZER1q27hWPHnpzat6dngP37H2b16tUVnadRAoHmGYjIgjDTKJ/w\nQzu3uae87KZhvb29DRME6kU1AxFJnGpG+WSDR3hCWSMMEa2WagYiMu9lR/mEF6EptagNTC9s02jN\nPUlRczAws3bgb4G24HhfcfePm9lSYA/QR2Y82PXufqzW84nI/Fdts4+ae6pX82gidz8JrHH3VcBV\nwFvN7HXAJmC/u18OPAHcVeu5RGRhqGaUT6VzDGrdb76Jtc/AzBaRqSV8APgL4Gp3P2pm5wBpd18R\nsY/6DEQkUrmjfKpNKdGoqSggoekoLLOA6JPAJcCfuvtdZvacuy8NfedZd18Wsa+CgYhUrdqUEo2c\nigIS2oHs7q8Aq8ysB/i6mb0ayH/CF33ib9myZer14OAgg4ODcRRLRBaAajqba9lvrqTTadLpdN2O\nH/vQUjP7KPAi8D5gMNRMNOTuBYnJVTMQkVqoZhCPmjuQzewsM1scvE4B68isRP0t4L3B194DfLPW\nc4mI5Ks2pYRSUeSquWZgZq8BHiMTWJqAPe5+n5ktA74MXAAcITO09PmI/VUzEJGqVLPgTaljNFIg\nSGQHck0FUDAQkSo08kigOCgYiMiC1+jt/XFIXJ+BiEg9RU0Ky12UZhI4SXPzeRUvSqMJZ9MUDEQk\nsYotWDOdruIBYAXwfo4f/xEHDnyv5mMvVGomEpFEmqkp6OGH/4xbbrkd+G7k57UcuxGomUhEFoSZ\n1iceGLiK7u4VRT+v5dgLkYKBiCTSTOsT9/f38/LLR4p+XsuxFyIFAxFJpHImhW3efEdVk8Y04ayQ\n+gxEJNGiJoWF5xicPPk0d9/9EW6++f0VP8wbdcIZaJ6BiCxw86HzNw7qQBaReSVqrH+p8f/q/K0P\nBQMRmTNRY/1nGv+vzt/6UDORiMyJqOaejo6rMWuasQko22fQ2trH6dNHFlxeIkjo4jYiIpWKWlym\nuflsIEVUE1A4GKxffwNr116Tk7F0ZGSkITuCk0LNRCIyJ6Kae86cmeCVV56hnCag3t5eVq9ezf79\nTyitRAzUTCQicyaquQcouwloIY8s0tBSEZlXosb6Fxv/n799ZGSEdetu4dixJ6e+09MzwP79D7N6\n9epZv5bZpGAgIgtS1GI2a9deo5pBXMeb6wexgoFIPBp5Nu1MSjUH7d//xIIcWaRJZyJSYL7n5i81\n0Wz9+huCoPAwR44cWhCBoB5qrhmY2fnAF4HlwCvAn7n7NjNbCuwB+oBx4Hp3Pxaxv2oGIjVYCJ2o\nC+EaK5XEmsHLwP9091cDbwB+z8xWAJuA/e5+OfAEcFcM5xKRPI2cnqHcZSfnIsvoglsS091j/QO+\nAawFDgHLg23nAIeKfN9FpHoTExOeSi1z+L6DO3zfU6llPjExMddFK2nnzt2eSi3zxYsHPJVa5jt3\n7p5xn4mJCR8eHq77tVVTttkWPDtje3bH2oFsZv1AGrgSeMbdl4Y+e9bdl0Xs43GWQWQharT0DElu\n9kly2cISm47CzLqArwC3u/txM8t/whd94m/ZsmXq9eDgIIODg3EVSySxJicnGR0dBWDVqlU15eLP\nT8+QpIdWVri80akozmP37t1cfvnlVd2PuESVLSolxmxLp9Ok0+n6nSCO6gWZoPLXZAJBdttBcpuJ\nDhbZN966k0gD2Llzt7e2djsscrjU29oWV9QU0QjNGGH55d2+/ZG8pq2tDu1V3484NUqzGzE3E8UV\nDL4I/HHetq3AxuD1RuD+IvvW4z6JJNbExIR3dCxxWFrVA6dRHlZZxcqbDQhdXVc6dFR9P+ohG7x6\nelYlNtjGHQxqHk1kZr8B/HfgGjMbNbMDZvaWIBisM7PDwJuA+2s9l8h8MD4+TnPzcuAiwiOAmprO\nL2sEUKONHipW3oGBqzhy5BB/8id/QCp1IeXej9kY5bMQ5y7U3Gfg7v8XaC7y8dpajy8y3/T393Pm\nzFHAyGTnzHRSvvLKT8paoCU322dm3yQv7lKqvL29vbz00ilOnPgxmdTVpe9HVEqKej2oe3t7E9n3\nUjdxVjOq+UPNRLIAZfoMuoI28kum2sjLHTrZCM0YYcXKO92EtNWhyyHl8CpvaekquKZGax6rN2Ju\nJtLiNiJzIDv6JzyaKJuXv5xfvY0weiisWHmnR+7cCSwF7gQW0dz8i4JjJHWUz3yhRHUiCdAoY9vj\nNn3dXwXeCRS//oV6j4pJYjoKEalRbifrJHCS5ubzyu4Uns3UCXGdKzvv4NOfvp/29rcDZ1GqU3wu\nUlIsKHG2OVXzh/oMRPLazpc5vNYh5du3PzLjvrM55yCuc+Uf55Of/CNvb19SVn/AbKWkSDqSOM+g\npgIoGIi4u/v27Y8EHajld5DOZqdqXOeaad5Bo3SKz7W4g4E6kEUSYmDgKrq7V/DCC9EdpFGL18xm\np2ot5wqXfXR0lKamCyg276BROsXnGwUDkYTo7+/n5ZePEDUev9j4+tmcc1DtucJlf/HFHwJw+vSZ\nyOMsuLH9SRJnNaOaP9RMJDIlajz+TM0zsznnoNJz5ZZ9IpRyYnfw+hI1CVWJJKewroaGlsp8kW0K\n6erq4vjx42U3deQ3/4TfA+zdu5cPfehBXnjhwNQ+PT0D7N//MKtXr448Rr2v75lnngGis63mZydd\nt+4Wjh17EhgBbgGezH6Tzs438rWvfZZrr722LmWez+IeWqqagUgMsr+YU6mLHVKeSr2mrF+8pUbn\nZD/r7n5NxR3LcQuXs7W129vaFpcsc3R20nDNYG6uYz5Bo4lEkmW6KWQoGBZa3sOuVPNP4WdbHVLe\n3X3VrDerFG/qmanMhaOEWlu7vK1tsUYMxSDuYKAOZJEaTY+y6QT6iZo4FdWUsnfvXlpa+iK/D4Q+\nmwTW0Nn5eT772d/nuuuuA2BkZGRWRt2Mj4+HyjJCfnbR5ubz2Lt3L+eee27kaKOLLurjG9/YBWSa\nlbLH1IihhIkzslTzh2oG0uAqrRmU0/wzPeegcBLabC9skzv/Ib9mkK2xrPKOjiXe1rY453paW7sb\nahGeRoKaiUSSJ/uA7ujoD/oMrox8+BVr/unsvNLb23t8+/ZHQt/5aEGw6OhYUrdJZlEzewtnRq90\naPPW1p5gUZpw+Ya8uTnlHR1LvacnExxaW3vUR1AnCgYiCZV9mI6NjRVNl3Dvvfc5XBo8HDN/7e0X\nelvbYu/uzrSjv+tdNwTfGXZYlfPdzs7LvLPztTnbenpW+fDwcE1lL1bbGB4e9sWLB4JzTTgMe1fX\nlb5v3z6/+eYPhK5ldxAsLvO2th6/9977fNOmzQXXGkdZJUPBQKRBRS93OVTw63p6CcihgmanetQM\nKuvIznw2NjYWupboctaytKfMLO5goKylIlWqNHtnJg3DhcDngDXAAPAW2tsvZrpDthPIfuedQA/w\netrbX00qtYZHH90ee+bOwmUpz6Wp6SxGR0eLZgo9fvx4UO7PAYUZR5ubz4641jewefMdU3MpZivL\nqpQpzshSzR+qGUgDqrQTd+fO3d7S0pnXEfslb2/vyfvlPRT5nX379hW05ceVuTP31392ZvClBSuS\nhc+Xu8+YQ27HcdS1dnQs8YmJiVnvAJ+vSGIzEbADOAo8Fdq2FHgcOAzsAxYX2bc+d0qkTirN3pnb\nPBTuiM0dHZQde3/rrbfNevbOnTt3B2VcVPZ1hcsdnj8wPaqo8Fq1dGV84g4Gcc0z+DzwWeCLoW2b\ngP3u/oCZbQTuCraJNLRS2Tuzn+dnFm1qOhvoIrOs443AOIsWvZeBgatYvXp1wZKQH/vYH5Y1Fr9U\nKoti+0V9Z/36G/ilX1rKb//2nfziF4XNRVHpItauvWZq/sAFF1wwlaIC4Prr7+LUqelr7eq6iYGB\nq4pmLNXSlQkQV1QB+sitGRwClgevzwEOFdmvDjFTpH5mmmmb3/yRGaef7RSO7xdxfnPLrbfePmPz\nS6kmmnKai6KOk5+eIjcFRe79qbT2IcWRxGYijw4Gz+Z9/myR/eK/SyJ1lt+0U+wBODY2Fhqn3xU8\nCC/x1taempp/CgPSkM+Uv6icJppymovKSU+Rv1BN7v1RxtI4xB0MZjMdhRf7YMuWLVOvBwcHGRwc\nnIXiiFQu28Sydu01UwuxdHV1MTw8HJlaYnh4OGhSyjaZjJJKfYBvfOMvc5peKsk6Gp3KohMo3vwS\nvc+5mC1h9+7dvPvd76a3t7es5qLcZrLC9BRRC9Xk7rMSuCbIWLpLGUvLlE6nSafT9TtBXFGFwprB\nQXKbiQ4W2a8eQVMkdlFNLDOllpiuGZT+NV7u6Jri5yteM4jeZ3eopnKpt7Utzhk5VKq5qNzEdWHq\nOI4fCW4m6gf+MfR+K7AxeL0RuL/IfvW4TyKxinqYFU4Ai84sWmpBmEoekjNlMo0ahRS9T0cQFIo/\nxGdqLio2kqhUMJvNRXgWgkQGA2An8FPgJPBjMvXhpcB+MkNLHweWFNm3XvdKJDa5aRkyf1GpIbq6\nrvQvfOELkb+Mo+YFRB23WMqG4eFh7+5eVfJ84fNMTEz4tm3bPJV6TWifCW9vP8/b2i52yL+elVPn\nze67aNHKoufLP1f+PISo641zfsRCl8hgUFMBFAykAZRXM6i86aOSmkFu9tDS3925c7e3tnY7tOU1\nDS11uMhLjW4q3hQ1naG0nBqAJpXVl4KBSEwq/ZUa1cwxUxPQvn37CmYPl3PcqLIWZg+dnsiV/6s8\n08Sz2KcnuuU3+WwNAkVmdFO2z6BUVtX8Por29h4fGxsrUk71DdSbgoFIDKr99VoszXP+tulf5oUd\ntOUeN6xY9tB7772v4DoymVHPd7g81BS0zzNrInhOk8+2bdtyglVUs1VX15X+sY99LNRENZ2htL19\nSc51VdLsJbVRMBCpUb1/vUZnJ63tHOU2U01nC10cqhnUPupnelTUkJdawEc1g9kTdzBQ1lJZUCYn\nJ9m9ezdm51NsuclwRs387JrlZNucTj+RHX8/CZzE7Jypc1QqKnvo3Xd/JC/baCZbaHNzH/AwcAZ4\nEXgD8Aaam0/S1vabM2Y73bz5jqnzdHRczebNd3DWWWexY8dDtLcXZigN37tiWU6VaqIBxBlZqvlD\nNQOZJdNNNymfachkVJqFctI9uOennyhctrIW+SN4StcWHnHodjjX29q6IvsX8u9P9vo6Opb4u971\n7oLrHRsb8/b2JWXVMDRqqL5QM5FI5QqbbgpTIpSeTDXk5Yzkye3o7Sxrn1oU69SuNAdQJektNF8g\nGRQMJFHq8QuwHsccHh72zs5wh6o7THhHx6W+b9++qe9Md34O53132OGyGTtGc4+xzzOjfkrvU851\nV/rZvn37yp4DEf390terX/5zT8FAEqMe48nrNUa9nE7deGsG5XfaznTd1dyT4rOVC3/NR9ckyrte\nmTsKBpII9Rg1Uu+RKJk+g+nMoVHDPUulWSh30ZlKUzWUuu5a7km2HF1d+XMEih0/t+lsLhbZkfLF\nHQxmM2upzCOVLvBSjpkWPplpIZfs+66uLp555hmef/55lixZwqpVq6Yycq5dew2jo6MAU9shOhtp\nf39/wbXkLzpTbKGY8GI1M92Pme5lsc9mWrwmey179+7lQx96kBdemOn4hdlEy11kJ//clf67lwSI\nM7JU84dqBg2p0gVeZlJuYrRiI3uy71Opix3ag7/yJnxV2zQVV5NWnDWDqDLVq+ZRr/sh5UHNRJIU\n5S7wMtODpVRTRfTDbMij0zcPeSbtQniyVelyVPswjLtJq9QInXJH75QqUxzHL0WTzWZf3MFAzURS\ntXBzSKkFXoo18wAcPHiQHTt2AK8K9itsqhgZGclrKslfyKUTOD/45/LQtulyNDWdn9O0ki3Lc889\nFzp2ZnJYc/N5U01JkNuclFWqaaea5pH8pqXwMUp9Vm6ZopquRkZG6O/vz1nLOOpayxH3/ZA5EGdk\nqeYP1Qwa3kwLvEQ18+zcudtvvfX2oDmn+CQw93LGwH+0oppB/uSqtrbFnjs5rN2bmhaVbGZK4i/h\ncstUamJdPZq6pD5QM5HUS6mx42NjY/6FL3whJ0tlNitn7lDKWz274EpHxxK/9977Ilf7am/v8dw0\nytHr4mbPsWnTXTlNGdmRLtMjZbIP8ws9k40z22dwibe2dvumTZt9z549vmfPnoLA0tSUXezl+w5j\nnpm1W3oIan6Zstc61w+/mZp8qh36Gse5JV4KBlIXpTr/Mr/gU56ZhJTyW2+9ber7mYlcl/p0JssB\nh5Rfc826qeM1Ny8KvuNTfy0tZzlc4PmTwNrbL56aBJaf+bO1tTvngTsxMeE33/yB0LEz2Tw7O1f4\ntm3bfM+ePb5x413BMbLB4fzQ97NlvjB0DT3B++iFX6LKFJW2YS6VCuqlJ9bVnmFUk9Fmj4KBxK5U\nFX9sbMzzm36gwzs6wtkw85tlhkL7FPu83aMWWOnoWDo1yqWcSWKlvlOY1z/8a3jIp7Nvhss45Jmm\npsJjjo2NRZwvfK2FZUyaetYMZHbFHQzUgRyDpIytjhp3HzWmPvz90dFRDh8+XLTj97vf/S65nbUr\ngV5gcWjb7wF7mM7QeYRMh+5KMiue9gGbgTXAecAhWlsv5fTpa4HtZLJqnktz81E+85k/nurYzWT+\n7CK/Izh7TZly9wObgmP3AYfZvPluAPbu3YvZ2YCR26H8OeAtwfez2+4EHgUGgzK9b6pcra0T3H77\nh/j617+O+1lAD7md18XnRiRNNqvohg1raG3t48SJ05j9Jh0dF3P69BFlGF3I4owsUX9k/q87BPwA\n2BjxefwhcxYlZWx11Dj8Uour5DZ3XBT563b79keCtv38mkH+UoqLg+Nk2+1f7dPt+OFkbY8E78P7\njzls9NbWTv/kJ/9o6hpaWjojaw6trd2hcp+dVwP5knd0LJka4trR0Rcco7BDub29Jy/7Zv4v/CFv\na+vy227zJvN7AAAH0UlEQVT7/VAzU8oLF5JvrJpBVn72UzXtNB4aqZkIaAL+hcxPsFbge8CKvO/U\n4z7NiqSMoIgebVN8jdvo5pVs7pqrIuYM3BY88C516PCWlm4vXEoxf33ej3puGuclwefhpRinl2/M\nBoLc5outDtPpI1pauoIHc7iZp9hxwp93ebhDORsY8zs889MvbN/+SEQz0+4goE0fS2kbZC40WjB4\nPfBXofeb8msHjRwMkrLEX2E5hr1UJ2h0Bk/3RYt+ZSqrZeExx7yj40K/77778rJyvjZ0zlV5ZQh3\nEO9z+GXPXYox0+GbPW90x+aEwz5PpS72bdu2hcqd/53hqaycmeMUHqOjo8+3bduWE6zzfxWH30/f\np8Jsp+3tF+UcS7+uZbY1WjB4J/BI6P3vANvyvhP/XZol86tmUN7yhbnDRMMdkFFZPjvyPs9firGc\n4+Z+Z7rcQx61/GK5yzOWe18LawZz9+9ZJEzBIGGSMrY6qsmjVIbOSjN4hq+tWFbOqCyf4XM0Ny8K\nhpkWNtmUOm7+uaePudwh5anUlZHl6+joj/y80vuaOV90mUXmStzBwDLHrA8zez2wxd3fErzfFFzA\n1tB3/J577pnaZ3BwkMHBwbqVqR4afTRRsc+jjhm1HYh8HVUGyGQnzc8oOtNx87+TPeYFF1zA8ePH\ni5avq6sr8vNKZM9XrMwisyGdTpNOp6fef/zjH8fdLa7j1zsYNAOHgTcBPwOGgfXufjD0Ha9nGURE\n5iMzizUY1HWegbufMbNbyQw2bwJ2hAOBiIgkQ11rBmUVQDUDEZGKxV0zaIrrQCIi0rgUDERERMFA\nREQUDEREBAUDERFBwUBERFAwEBERFAxERAQFAxERQcFARERQMBARERQMREQEBQMREUHBQEREUDAQ\nEREUDEREBAUDERFBwUBERFAwEBERagwGZvbfzOyfzOyMmQ3kfXaXmf3QzA6a2bW1FVNEROqp1prB\nPwL/Ffh2eKOZXQFcD1wBvBV4yMxiW7g5SdLp9FwXoSYq/9xq5PI3ctmh8csft5qCgbsfdvcfAvkP\n+rcDu939ZXcfB34IvK6WcyVVo/8HpfLPrUYufyOXHRq//HGrV5/Bq4BnQu//LdgmIiIJ1DLTF8zs\nb4Dl4U2AA3e7+/+uV8FERGT2mLvXfhCzIeAOdz8QvN8EuLtvDd7/NXCPu/99xL61F0BEZAFy99j6\nYmesGVQgXKhvAV8ys0+TaR66FBiO2inOixERkerUOrT0HWb2DPB64P+Y2V8BuPsY8GVgDNgLfNDj\nqIKIiEhdxNJMJCIija0uo4nMbIeZHTWzp0LbdpvZgeDvX83sQOizyAlqZjZgZk+Z2Q/M7DP1KGut\n5TezPjN7MfTZQ3NZ/iJlf62Z/Z2ZjZrZsJn9WuizRrj3keVP2r0vUf6VZvb/zOz7ZvZNM+sKfdYI\n9z+y/Em7/2Z2vpk9YWb/bGb/aGa3BduXmtnjZnbYzPaZ2eLQPom5/5WWP/b77+6x/wFvBK4Cniry\n+aeAPwxeXwGMkum/6Af+hekay98Dq4PXe4E316O8NZa/r8T3Zr38UWUH9gHXBq/fCgwFr3+lEe59\nifIn6t6XKP8w8Mbg9XuB/9Vg979Y+RN1/4FzgKuC113AYWAFsBW4M9i+Ebg/ife/ivLHev/rUjNw\n9+8Az5X4yvXAzuB15AQ1MzsH6Hb3keB7XwTeUY/y5iuz/LtC7ws6weeq/EXK/gqQ/TW0hMy8D4C3\n0Rj3vlj5IUH3HoqW/5eD7QD7gXcGrxvl/hcrPyTo/rv7z939e8Hr48BB4Hwyz5jHgq89FipLou5/\nFeWHGO//rCeqM7P/DPzc3Z8ONhWboPYq4Ceh7T8hARPXQuX/UWhzf1BNGzKzNwbbklT+DwOfMrMf\nAw8AdwXbG+XeFys/JP/eA/yzmb0teH09mf/BoXHuf7HyQ0Lvv5n1k6nhfBdY7u5HIfPABc4OvpbY\n+19m+SHG+z8XWUvXk/urutHkl/+nwIXuPgDcAewMtwknxAeA2939QjIP1kfnuDyVKlb+n5H8ew9w\nE/B7ZjYCdAKn5rg8lSpW/kTe/6AMXyHz38xxMpNkwxI9aqaC8sd6/+OcZzAjM2sGfhsIZzj9N+CC\n0Pvzg23Fts+ZqPK7+2mCarW7HzCzHwGXkazyv8fdbwdw96+Y2Z8H2xvl3ueXf0fw+hTBgynB9x53\n/wHwZgAz+2Xgt4KPGuL+Fyt/Eu+/mbWQeZD+hbt/M9h81MyWu/vRoAllItieuPtfSfnjvv/1rBkY\nhe1Z64CD7v7T0LZvAe82szYzu4hgglpQHTpmZq8zMwP+B/BNZk9Z5Tezs8ysKXh9MZnyPz3H5c8v\n+7+Z2dVBGd9Epm0UGufe55f/B8HrJN77gvKbWW/wzybgD4HtwUcNcf+LlT+h9/9RYMzdHwxt+xaZ\njm+A94TKksT7X3b5Y7//deoV30mm+eQk8GPgxmD754Hfjfj+XWR68g8SjBoJtv8qmTTZPwQerEdZ\nay0/mZrCPwEHgH8ArpvL8keVHfj1oGyjwN8Bqxrp3hcrf9LufYny30ZmZMgh4BON9t9+sfIn7f4D\nvwGcAb4X/LdyAHgLsIxMx/dh4HFgSRLvf6Xlj/v+a9KZiIho2UsREVEwEBERFAxERAQFAxERQcFA\nRERQMBARERQMREQEBQMREQH+P+w0nqDLI7YdAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x11ecd67f0>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# now only for \"subjects\"\n",
    "\n",
    "subjdatedist = Counter()\n",
    "\n",
    "for docid, genreset in genresfordocs.items():\n",
    "    date = int(meta.loc[docid, 'inferreddate'])\n",
    "    for g in genreset:\n",
    "        if g.startswith('Subj:'):\n",
    "            subjdatedist[date] += 1\n",
    "            break\n",
    "\n",
    "x = []\n",
    "y = []\n",
    "for date, count in subjdatedist.items():\n",
    "    x.append(date)\n",
    "    y.append(count)\n",
    "\n",
    "plt.scatter(x, y)\n",
    "print(np.amax(list(subjdatedist.keys())))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Gather random contrast sets\n",
    "\n",
    "We gather two contrast sets. (Some predictive modeling we do is going to require that a model trained on contrast set A be applied to a non-overlapping set B.)\n",
    "\n",
    "Note however that we do *not* ensure the sets are non-overlapping with our genre sets. That would create a selection bias. Later on, we can select *within* randomA or randomB to get a negative set that matches a given positive set by date and doesn't overlap with it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "690\n"
     ]
    }
   ],
   "source": [
    "# let's get random fiction that roughly matches that date distribution\n",
    "\n",
    "chosenrandomA = []\n",
    "chosenrandomB = []\n",
    "\n",
    "for date, count in maxforyear.items():\n",
    "    candidates = meta.loc[meta.inferreddate == date, : ]\n",
    "    eligible_indices = candidates.index.tolist()\n",
    "    choiceA = random.sample(eligible_indices, count)\n",
    "    eligible_indices = list(set(eligible_indices) - set(choiceA))\n",
    "    if len(eligible_indices) < count:\n",
    "        print(\"error, running dry\")\n",
    "        break\n",
    "    else:\n",
    "        choiceB = random.sample(eligible_indices, count)\n",
    "    \n",
    "    chosenrandomA.extend(choiceA)\n",
    "    chosenrandomB.extend(choiceB)\n",
    "        \n",
    "print(len(chosenrandomA))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5585\n",
      "6872\n"
     ]
    }
   ],
   "source": [
    "# integrate the random choices into genresfordocs:\n",
    "print(len(genresfordocs))\n",
    "\n",
    "for docid in chosenrandomA:\n",
    "    if docid not in genresfordocs:\n",
    "        genresfordocs[docid] = {'randomA'}\n",
    "    else:\n",
    "        genresfordocs[docid].add('randomA')\n",
    "\n",
    "for docid in chosenrandomB:\n",
    "    if docid not in genresfordocs:\n",
    "        genresfordocs[docid] = {'randomB'}\n",
    "    else:\n",
    "        genresfordocs[docid].add('randomB')\n",
    "\n",
    "print(len(genresfordocs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the sizes and mean dates of genres as selected. We might need this later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('../metadata/selected_genres.tsv', mode = 'w', encoding = 'utf-8') as f:\n",
    "    writer = csv.DictWriter(f, delimiter = '\\t', fieldnames = ['genre', 'numvols', 'meandate'])\n",
    "    writer.writeheader()\n",
    "    for genrename, examples in genredist.items():\n",
    "        dates = meta.loc[examples, 'inferreddate']\n",
    "        outrow = dict()\n",
    "        outrow['genre'] = genrename\n",
    "        outrow['numvols'] = len(examples)\n",
    "        outrow['meandate'] = round(np.mean(dates), 2)\n",
    "        writer.writerow(outrow)\n",
    "    examples = chosenrandomA\n",
    "    dates = meta.loc[examples, 'inferreddate']\n",
    "    outrow = dict()\n",
    "    outrow['genre'] = 'randomA'\n",
    "    outrow['numvols'] = len(examples)\n",
    "    outrow['meandate'] = round(np.mean(dates), 2)\n",
    "    writer.writerow(outrow)\n",
    "    examples = chosenrandomB\n",
    "    dates = meta.loc[examples, 'inferreddate']\n",
    "    outrow = dict()\n",
    "    outrow['genre'] = 'randomB'\n",
    "    outrow['numvols'] = len(examples)\n",
    "    outrow['meandate'] = round(np.mean(dates), 2)\n",
    "    writer.writerow(outrow)\n",
    "    \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Actually write the main metadata to disk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "# okay, now let's write the thing\n",
    "\n",
    "allchosen = list(genresfordocs.keys())\n",
    "\n",
    "chosendf = meta.loc[allchosen, ['author', 'shorttitle', 'enumcron', 'inferreddate', 'genres', 'subjects', 'allcopiesofwork']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(6872, 7)"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chosendf.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a column for the genre tags.\n",
    "\n",
    "def mygenres(idx):\n",
    "    global genresfordocs\n",
    "    if idx not in genresfordocs:\n",
    "        print(\"ERROR\")\n",
    "        return float('nan')\n",
    "    else:\n",
    "        return '|'.join(genresfordocs[idx])\n",
    "\n",
    "chosendf = chosendf.assign(tags = chosendf.index.map(mygenres))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "chosendf.rename(index=str, columns={\"inferreddate\": \"firstpub\"}, inplace = True)\n",
    "\n",
    "# No, these are not really dates of first publication. But my modeling code has\n",
    "# evolved to use 'firstpub' as a default term and I don't want to fiddle with it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>author</th>\n",
       "      <th>shorttitle</th>\n",
       "      <th>enumcron</th>\n",
       "      <th>firstpub</th>\n",
       "      <th>genres</th>\n",
       "      <th>subjects</th>\n",
       "      <th>allcopiesofwork</th>\n",
       "      <th>tags</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>docid</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>mdp.39015041290811</th>\n",
       "      <td>MacLean, Rory</td>\n",
       "      <td>The oatmeal ark : from the Western Isles to a ...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1997</td>\n",
       "      <td>Fiction|Fantastic fiction</td>\n",
       "      <td>Fiction|Voyages and travels</td>\n",
       "      <td>1</td>\n",
       "      <td>Fantasy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>uc2.ark+=13960=t9h41km75</th>\n",
       "      <td>Stephens, C. A. (Charles Asbury)</td>\n",
       "      <td>A busy year at the old squire's</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1922</td>\n",
       "      <td>NotFiction</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>randomB</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>uc1.$b299829</th>\n",
       "      <td>Clewes, Winston</td>\n",
       "      <td>Sweet river in the morning, a novel</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1946</td>\n",
       "      <td>Fiction</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>randomA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mdp.39015069154246</th>\n",
       "      <td>Bouldrey, Brian</td>\n",
       "      <td>Love, the magician</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2000</td>\n",
       "      <td>Novel|Psychological fiction</td>\n",
       "      <td>Fiction|Gay men</td>\n",
       "      <td>1</td>\n",
       "      <td>Psychological B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>uiuo.ark+=13960=t6j10p77j</th>\n",
       "      <td>Opie, Amelia</td>\n",
       "      <td>Temper ; or, Domestic scenes</td>\n",
       "      <td>v.3</td>\n",
       "      <td>1812</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>randomA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     author  \\\n",
       "docid                                                         \n",
       "mdp.39015041290811                            MacLean, Rory   \n",
       "uc2.ark+=13960=t9h41km75   Stephens, C. A. (Charles Asbury)   \n",
       "uc1.$b299829                                Clewes, Winston   \n",
       "mdp.39015069154246                          Bouldrey, Brian   \n",
       "uiuo.ark+=13960=t6j10p77j                      Opie, Amelia   \n",
       "\n",
       "                                                                  shorttitle  \\\n",
       "docid                                                                          \n",
       "mdp.39015041290811         The oatmeal ark : from the Western Isles to a ...   \n",
       "uc2.ark+=13960=t9h41km75                     A busy year at the old squire's   \n",
       "uc1.$b299829                             Sweet river in the morning, a novel   \n",
       "mdp.39015069154246                                        Love, the magician   \n",
       "uiuo.ark+=13960=t6j10p77j                       Temper ; or, Domestic scenes   \n",
       "\n",
       "                          enumcron  firstpub                       genres  \\\n",
       "docid                                                                       \n",
       "mdp.39015041290811             NaN      1997    Fiction|Fantastic fiction   \n",
       "uc2.ark+=13960=t9h41km75       NaN      1922                   NotFiction   \n",
       "uc1.$b299829                   NaN      1946                      Fiction   \n",
       "mdp.39015069154246             NaN      2000  Novel|Psychological fiction   \n",
       "uiuo.ark+=13960=t6j10p77j      v.3      1812                          NaN   \n",
       "\n",
       "                                              subjects  allcopiesofwork  \\\n",
       "docid                                                                     \n",
       "mdp.39015041290811         Fiction|Voyages and travels                1   \n",
       "uc2.ark+=13960=t9h41km75                           NaN                2   \n",
       "uc1.$b299829                                       NaN                1   \n",
       "mdp.39015069154246                     Fiction|Gay men                1   \n",
       "uiuo.ark+=13960=t6j10p77j                          NaN                3   \n",
       "\n",
       "                                      tags  \n",
       "docid                                       \n",
       "mdp.39015041290811                 Fantasy  \n",
       "uc2.ark+=13960=t9h41km75           randomB  \n",
       "uc1.$b299829                       randomA  \n",
       "mdp.39015069154246         Psychological B  \n",
       "uiuo.ark+=13960=t6j10p77j          randomA  "
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chosendf.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "chosendf.to_csv('../metadata/genremeta.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Measures of social proximity\n",
    "\n",
    "We can also use metadata to generate estimates of the *social* proximity of two genres. I'm going to do this by measuring the pointwise mutual information of their co-association with volumes.\n",
    "    \n",
    "    log(p(x, y)/p(x)p(y)).\n",
    "    \n",
    "Where p(x) is, for instance, the probability that a volume is labeled \"Humor,\" p(y) the probability that a volume is labeled \"Horror,\" and p(x, y) the probability that the same volume is labeled \"Humor\" and \"Horror.\"\n",
    "\n",
    "However, we also want to acknowledge that the probability of two categories overlapping declines when they have largely disjoint chronological distributions. The \"subjects\" we're considering are, by and large, assigned earlier than the \"genres,\" and they will tend not to be co-associated for that reason.\n",
    "\n",
    "So I'm proposing a conditional version of PMI, where p(x), p(y), and p(x, y) are all measured *within* a sample of volumes defined by the chronological distribution of x and y. This will give a boost to PMI in cases where the chronological distributions are spread out over a wide swath of the timeline, because p(x), p(y), and p(x, y) are all lowered by the same amount, but that amount gets basically squared in the denominator: ```p(x)p(y)```. At least that's how I think it will work! It conforms intuitively to the notion that coincidence of x and y is less likely when the events are rare within the frame of reference (of say 100 years), and more likely when \"the relevant framework\" is just (say) 30 years where the genres substantially coincide.\n",
    "\n",
    "#### Priors\n",
    "\n",
    "It also makes sense to define certain priors about the relationships of genres. Catalogers may not specify the subject \"humor\" if they've already put \"humor\" in the genre field. But we can see that those tags are substantially equivalent.\n",
    "\n",
    "I don't want to hard-code a fixed value, so I have expressed this as a strong prior that \"bends\" evidence in its direction. That also permits the possibility that the subject \"humor\" may actually mean something different from the genre \"humor,\" if only because the tags were assigned in different periods. This is also why I have expressed the expected similarity as 80% (see below) rather than a firm postulate of identity.\n",
    "\n",
    "Note that this is not a rigorous Bayesian set up where every measure of social proximity is going to be calculated ```posterior = prior x likelihood.``` Most of the PMI values will be calculated directly from the evidence. The priors are just an ad-hoc fix for particular pairs of genres that we expect to be very similar."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Fantasy': {'Subj: Fantasy': 0.8},\n",
       " 'Historical': {'Subj: History': 0.8},\n",
       " 'Horror': {'Subj: Horror': 0.8},\n",
       " 'Humor': {'Subj: Humor': 0.8},\n",
       " 'Juvenile': {'Subj: Juvenile': 0.8},\n",
       " 'Love': {'Subj: Man-woman': 0.6},\n",
       " 'Mystery': {'Subject: Detective': 0.8},\n",
       " 'SF': {'Subj: SF, American': 0.8, 'Subj: SF, Other': 0.8},\n",
       " 'Short stories': {'Subj: Short stories, American': 0.8,\n",
       "  'Subj: Short stories, Other': 0.8},\n",
       " 'Subj: Fantasy': {'Fantasy': 0.8},\n",
       " 'Subj: History': {'Historical': 0.8},\n",
       " 'Subj: Horror': {'Horror': 0.8},\n",
       " 'Subj: Humor': {'Humor': 0.8},\n",
       " 'Subj: Juvenile': {'Juvenile': 0.8},\n",
       " 'Subj: Man-woman': {'Love': 0.6},\n",
       " 'Subj: SF, American': {'SF': 0.8, 'Subj: SF, Other': 0.8},\n",
       " 'Subj: SF, Other': {'SF': 0.8, 'Subj: SF, American': 0.8},\n",
       " 'Subj: Short stories, American': {'Short stories': 0.8,\n",
       "  'Subj: Short stories, Other': 0.8},\n",
       " 'Subj: Short stories, Other': {'Short stories': 0.8,\n",
       "  'Subj: Short stories, American': 0.8},\n",
       " 'Subject: Detective': {'Mystery': 0.8}}"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The list of priors\n",
    "\n",
    "priors = {'Subj: Horror': {'Horror': 0.8},\n",
    "          'Subj: Humor': {'Humor': 0.8},\n",
    "          'Subj: History': {'Historical': 0.8},\n",
    "          'Subj: Man-woman': {'Love': 0.6}, \n",
    "          'Subj: Short stories, American': {'Short stories': 0.8, 'Subj: Short stories, Other': 0.8},\n",
    "          'Subj: Short stories, Other': {'Short stories': 0.8, 'Subj: Short stories, American': 0.8},\n",
    "          'Subj: SF, American': {'SF': 0.8, 'Subj: SF, Other' : 0.8},\n",
    "          'Subj: SF, Other': {'SF': 0.8, 'Subj: SF, American': 0.8},\n",
    "          'Subj: Fantasy': {'Fantasy': 0.8}, \n",
    "          'Subj: Juvenile': {'Juvenile': 0.8},\n",
    "          'Subject: Detective': {'Mystery': 0.8}\n",
    "         }\n",
    "\n",
    "# Let's turn that into a symmetric dictionary where A -> B\n",
    "# also implies B -> A\n",
    "tuplelist = []\n",
    "for key1, matches in priors.items():\n",
    "    for match, realnumber in matches.items():\n",
    "        tuplelist.append((key1, match, realnumber))\n",
    "        \n",
    "for t in tuplelist:\n",
    "    key1, match, realnumber = t\n",
    "    if match not in priors:\n",
    "        priors[match] = dict()\n",
    "    priors[match][key1] = realnumber\n",
    "    \n",
    "priors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Short stories\n",
      "Subj: Humor\n",
      "Suspense\n",
      "Subj: Detective\n",
      "Domestic\n",
      "Christian\n",
      "Horror\n",
      "Juvenile\n",
      "Historical\n",
      "Subj: Juvenile\n",
      "Fantasy\n",
      "Subj: SF, Other\n",
      "Subj: Man-woman\n",
      "Adventure\n",
      "Humor\n",
      "Love\n",
      "Novel\n",
      "Western\n",
      "Political\n",
      "SF\n",
      "Subj: Horror\n",
      "Bildungsroman\n",
      "Subj: Short stories, American\n",
      "Subj: History\n",
      "Subj: Fantasy\n",
      "Subj: Fairy tales\n",
      "Subj: Short stories, Other\n",
      "War\n",
      "Psychological\n",
      "Subj: SF, American\n",
      "Biographical\n",
      "Mystery\n"
     ]
    }
   ],
   "source": [
    "def sample_bag_of_years(bag_of_years, yeardict):\n",
    "    sample = []\n",
    "    errors = 0\n",
    "    for i in range(25000):\n",
    "        b = random.choice(bag_of_years)\n",
    "        if pd.isnull(b):\n",
    "            errors += 1\n",
    "        else:\n",
    "            vol = random.choice(yeardict[b])\n",
    "            sample.append(vol)\n",
    "    return sample\n",
    "\n",
    "def labelprob(sample, label, theprior, otherlabel):\n",
    "    global category_dict\n",
    "    count = 0\n",
    "    bothcount = 0\n",
    "    \n",
    "    for s in sample:\n",
    "        if s in category_dict[label]:\n",
    "            count += 1\n",
    "            bothcount += 1\n",
    "        elif theprior > 0 and theprior < 1 and s in category_dict[otherlabel]:\n",
    "            bothcount += 1\n",
    "    \n",
    "    if theprior > 0 and theprior < 1:\n",
    "        count = (count + (bothcount * theprior)) / 2\n",
    "        \n",
    "    # the addition of 0.1 is Laplacian smoothing\n",
    "    return (count + 0.1) / len(sample)\n",
    "\n",
    "def labelxyprob(sample, labelx, labely, theprior):\n",
    "    global category_dict\n",
    "    count = 0\n",
    "    candidates = 0\n",
    "    for s in sample:\n",
    "        if s in category_dict[labelx] and s in category_dict[labely]:\n",
    "            count += 1\n",
    "            candidates += 1\n",
    "        elif s in category_dict[labelx]:\n",
    "            candidates += 1\n",
    "        elif s in category_dict[labely]:\n",
    "            candidates += 1\n",
    "    \n",
    "    if theprior > 0 and theprior < 1:\n",
    "        priorcount = candidates * theprior\n",
    "        count = (count + priorcount) / 2\n",
    "        \n",
    "    # This prior is informative only in a small number of cases\n",
    "    # where we expect a strong match, and know that the data\n",
    "    # will underrepresent the match, because of the difference\n",
    "    # of \"subjects\" and \"genres.\"\n",
    "    \n",
    "    # Most comparisons will have\n",
    "    # no prior, because we don't actually let theprior == 0\n",
    "    # drag a comparison down below observed evidence.\n",
    "        \n",
    "    # the addition of 0.1 is Laplacian smoothing\n",
    "    return (count + 0.1) / len(sample)\n",
    "\n",
    "yeardict = dict()\n",
    "for yr in range(1700, 2100):\n",
    "    yeardict[yr] = meta.index[meta['inferreddate'] == yr].tolist()\n",
    "\n",
    "pmidict = dict()\n",
    "\n",
    "for name1, ex1 in category_dict.items():\n",
    "    print(name1)\n",
    "    for name2, ex2 in category_dict.items():\n",
    "        \n",
    "        if name1 not in pmidict:\n",
    "            pmidict[name1] = dict()\n",
    "            \n",
    "        if name2 in pmidict and name1 in pmidict[name2]:\n",
    "            pmidict[name1][name2] = pmidict[name2][name1]\n",
    "            \n",
    "        else:\n",
    "            bag1 = meta.loc[ex1, 'inferreddate']\n",
    "            bag2 = meta.loc[ex2, 'inferreddate']\n",
    "            jointbag = list(bag1) + list(bag2)\n",
    "            sample_of_docids = sample_bag_of_years(jointbag, yeardict)\n",
    "            \n",
    "            if name1 == name2:\n",
    "                theprior = 1\n",
    "                # There is no meaningful definition of PMI where two categories\n",
    "                # perfectly coincide. We're going to improvise, in a way that\n",
    "                # allows small categories to be more self-similar than\n",
    "                # large ones.\n",
    "                \n",
    "            elif name1 in priors and name2 in priors[name1]:\n",
    "                theprior = priors[name1][name2]\n",
    "            else:\n",
    "                theprior = 0\n",
    "            \n",
    "            \n",
    "            prob2 = labelprob(sample_of_docids, name2, theprior, name1)\n",
    "            prob1 = labelprob(sample_of_docids, name1, theprior, name2) \n",
    "            jointprob = labelxyprob(sample_of_docids, name1, name2, theprior)\n",
    "            pmidict[name1][name2] = math.log(jointprob / (prob1 * prob2))\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [],
   "source": [
    "pmidf = pd.DataFrame(pmidict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Humor 4.0896471879\n"
     ]
    }
   ],
   "source": [
    "for idx in pmidf.index:\n",
    "    maximum = max(pmidf.loc[idx, : ])\n",
    "    if maximum > pmidf.loc[idx, idx]:\n",
    "        print(idx, maximum)\n",
    "        pmidf.loc[idx, idx] = maximum"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 132,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.094113113857693637"
      ]
     },
     "execution_count": 132,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pmidf['Adventure']['Mystery']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Adventure</th>\n",
       "      <th>Bildungsroman</th>\n",
       "      <th>Biographical</th>\n",
       "      <th>Christian</th>\n",
       "      <th>Domestic</th>\n",
       "      <th>Fantasy</th>\n",
       "      <th>Historical</th>\n",
       "      <th>Horror</th>\n",
       "      <th>Humor</th>\n",
       "      <th>Juvenile</th>\n",
       "      <th>...</th>\n",
       "      <th>Subj: Humor</th>\n",
       "      <th>Subj: Juvenile</th>\n",
       "      <th>Subj: Man-woman</th>\n",
       "      <th>Subj: SF, American</th>\n",
       "      <th>Subj: SF, Other</th>\n",
       "      <th>Subj: Short stories, American</th>\n",
       "      <th>Subj: Short stories, Other</th>\n",
       "      <th>Suspense</th>\n",
       "      <th>War</th>\n",
       "      <th>Western</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Adventure</th>\n",
       "      <td>4.279903</td>\n",
       "      <td>0.435610</td>\n",
       "      <td>0.343080</td>\n",
       "      <td>0.221690</td>\n",
       "      <td>0.305965</td>\n",
       "      <td>0.588636</td>\n",
       "      <td>1.379508</td>\n",
       "      <td>-3.243798</td>\n",
       "      <td>1.227815</td>\n",
       "      <td>1.130951</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.400227</td>\n",
       "      <td>0.936398</td>\n",
       "      <td>0.441526</td>\n",
       "      <td>-2.845089</td>\n",
       "      <td>1.117171</td>\n",
       "      <td>-2.666464</td>\n",
       "      <td>-3.010117</td>\n",
       "      <td>-1.688237</td>\n",
       "      <td>0.795667</td>\n",
       "      <td>-2.159260</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Bildungsroman</th>\n",
       "      <td>0.435610</td>\n",
       "      <td>4.053356</td>\n",
       "      <td>-3.165622</td>\n",
       "      <td>1.131953</td>\n",
       "      <td>1.720973</td>\n",
       "      <td>-0.694249</td>\n",
       "      <td>1.093455</td>\n",
       "      <td>-0.328167</td>\n",
       "      <td>1.701955</td>\n",
       "      <td>-3.335126</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.928306</td>\n",
       "      <td>-3.135787</td>\n",
       "      <td>0.821285</td>\n",
       "      <td>-3.273126</td>\n",
       "      <td>-2.619720</td>\n",
       "      <td>-3.338657</td>\n",
       "      <td>-3.604928</td>\n",
       "      <td>-0.693552</td>\n",
       "      <td>2.265967</td>\n",
       "      <td>0.977699</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Biographical</th>\n",
       "      <td>0.343080</td>\n",
       "      <td>-3.165622</td>\n",
       "      <td>4.916599</td>\n",
       "      <td>3.110273</td>\n",
       "      <td>0.009942</td>\n",
       "      <td>-2.894851</td>\n",
       "      <td>2.114180</td>\n",
       "      <td>-2.517358</td>\n",
       "      <td>-2.962145</td>\n",
       "      <td>-1.474224</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.756554</td>\n",
       "      <td>-1.443649</td>\n",
       "      <td>-2.944366</td>\n",
       "      <td>-2.256043</td>\n",
       "      <td>-1.651594</td>\n",
       "      <td>-2.421566</td>\n",
       "      <td>-2.428072</td>\n",
       "      <td>-3.985702</td>\n",
       "      <td>1.844265</td>\n",
       "      <td>2.425606</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Christian</th>\n",
       "      <td>0.221690</td>\n",
       "      <td>1.131953</td>\n",
       "      <td>3.110273</td>\n",
       "      <td>5.416200</td>\n",
       "      <td>1.049163</td>\n",
       "      <td>2.156130</td>\n",
       "      <td>1.000321</td>\n",
       "      <td>-2.123703</td>\n",
       "      <td>-2.363184</td>\n",
       "      <td>-1.696563</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.407832</td>\n",
       "      <td>-1.033527</td>\n",
       "      <td>1.248363</td>\n",
       "      <td>-1.256016</td>\n",
       "      <td>-1.016478</td>\n",
       "      <td>-1.682466</td>\n",
       "      <td>-1.673172</td>\n",
       "      <td>-0.660272</td>\n",
       "      <td>-1.463414</td>\n",
       "      <td>-0.668447</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Domestic</th>\n",
       "      <td>0.305965</td>\n",
       "      <td>1.720973</td>\n",
       "      <td>0.009942</td>\n",
       "      <td>1.049163</td>\n",
       "      <td>3.189220</td>\n",
       "      <td>-2.378864</td>\n",
       "      <td>1.054664</td>\n",
       "      <td>-1.374636</td>\n",
       "      <td>0.861812</td>\n",
       "      <td>-4.720872</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.603519</td>\n",
       "      <td>-4.332381</td>\n",
       "      <td>1.263024</td>\n",
       "      <td>-3.971152</td>\n",
       "      <td>-3.427581</td>\n",
       "      <td>-1.001827</td>\n",
       "      <td>-4.536183</td>\n",
       "      <td>-0.428196</td>\n",
       "      <td>0.661536</td>\n",
       "      <td>0.218978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Fantasy</th>\n",
       "      <td>0.588636</td>\n",
       "      <td>-0.694249</td>\n",
       "      <td>-2.894851</td>\n",
       "      <td>2.156130</td>\n",
       "      <td>-2.378864</td>\n",
       "      <td>4.234711</td>\n",
       "      <td>0.909377</td>\n",
       "      <td>2.179593</td>\n",
       "      <td>0.245513</td>\n",
       "      <td>-0.639921</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.528873</td>\n",
       "      <td>-2.494904</td>\n",
       "      <td>0.011914</td>\n",
       "      <td>-0.423531</td>\n",
       "      <td>0.611025</td>\n",
       "      <td>-3.223219</td>\n",
       "      <td>-3.228188</td>\n",
       "      <td>0.249198</td>\n",
       "      <td>-0.496345</td>\n",
       "      <td>-2.330399</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Historical</th>\n",
       "      <td>1.379508</td>\n",
       "      <td>1.093455</td>\n",
       "      <td>2.114180</td>\n",
       "      <td>1.000321</td>\n",
       "      <td>1.054664</td>\n",
       "      <td>0.909377</td>\n",
       "      <td>3.489242</td>\n",
       "      <td>0.085551</td>\n",
       "      <td>-1.092102</td>\n",
       "      <td>-1.965720</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.201279</td>\n",
       "      <td>-3.927214</td>\n",
       "      <td>0.290489</td>\n",
       "      <td>-3.801247</td>\n",
       "      <td>-3.134623</td>\n",
       "      <td>-4.045185</td>\n",
       "      <td>-4.163451</td>\n",
       "      <td>0.117043</td>\n",
       "      <td>2.682408</td>\n",
       "      <td>1.667594</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Horror</th>\n",
       "      <td>-3.243798</td>\n",
       "      <td>-0.328167</td>\n",
       "      <td>-2.517358</td>\n",
       "      <td>-2.123703</td>\n",
       "      <td>-1.374636</td>\n",
       "      <td>2.179593</td>\n",
       "      <td>0.085551</td>\n",
       "      <td>4.612799</td>\n",
       "      <td>-3.410408</td>\n",
       "      <td>-2.254403</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.001259</td>\n",
       "      <td>-1.756085</td>\n",
       "      <td>-3.296209</td>\n",
       "      <td>-2.493054</td>\n",
       "      <td>-1.911615</td>\n",
       "      <td>-2.608324</td>\n",
       "      <td>-0.432951</td>\n",
       "      <td>0.245167</td>\n",
       "      <td>-2.487540</td>\n",
       "      <td>0.969952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Humor</th>\n",
       "      <td>1.227815</td>\n",
       "      <td>1.701955</td>\n",
       "      <td>-2.962145</td>\n",
       "      <td>-2.363184</td>\n",
       "      <td>0.861812</td>\n",
       "      <td>0.245513</td>\n",
       "      <td>-1.092102</td>\n",
       "      <td>-3.410408</td>\n",
       "      <td>4.089647</td>\n",
       "      <td>-0.521751</td>\n",
       "      <td>...</td>\n",
       "      <td>4.089647</td>\n",
       "      <td>-0.587908</td>\n",
       "      <td>1.357892</td>\n",
       "      <td>-2.979538</td>\n",
       "      <td>-2.398768</td>\n",
       "      <td>-3.169652</td>\n",
       "      <td>-3.294340</td>\n",
       "      <td>-0.207172</td>\n",
       "      <td>-2.925803</td>\n",
       "      <td>-2.301749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Juvenile</th>\n",
       "      <td>1.130951</td>\n",
       "      <td>-3.335126</td>\n",
       "      <td>-1.474224</td>\n",
       "      <td>-1.696563</td>\n",
       "      <td>-4.720872</td>\n",
       "      <td>-0.639921</td>\n",
       "      <td>-1.965720</td>\n",
       "      <td>-2.254403</td>\n",
       "      <td>-0.521751</td>\n",
       "      <td>3.087760</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.941342</td>\n",
       "      <td>3.026684</td>\n",
       "      <td>-1.469373</td>\n",
       "      <td>-2.882120</td>\n",
       "      <td>-2.457515</td>\n",
       "      <td>-3.453150</td>\n",
       "      <td>-3.962527</td>\n",
       "      <td>-3.733282</td>\n",
       "      <td>0.312924</td>\n",
       "      <td>-1.991889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Love</th>\n",
       "      <td>0.454138</td>\n",
       "      <td>0.840889</td>\n",
       "      <td>1.066856</td>\n",
       "      <td>2.012803</td>\n",
       "      <td>0.651874</td>\n",
       "      <td>0.296712</td>\n",
       "      <td>0.995560</td>\n",
       "      <td>-0.121174</td>\n",
       "      <td>1.410084</td>\n",
       "      <td>-4.539916</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.133929</td>\n",
       "      <td>-4.039011</td>\n",
       "      <td>3.335897</td>\n",
       "      <td>-3.661264</td>\n",
       "      <td>-2.987632</td>\n",
       "      <td>-1.668833</td>\n",
       "      <td>-4.232899</td>\n",
       "      <td>-1.385372</td>\n",
       "      <td>1.073255</td>\n",
       "      <td>0.930612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mystery</th>\n",
       "      <td>0.094113</td>\n",
       "      <td>-2.462436</td>\n",
       "      <td>-3.930831</td>\n",
       "      <td>-0.743228</td>\n",
       "      <td>-1.086038</td>\n",
       "      <td>-0.131421</td>\n",
       "      <td>0.971470</td>\n",
       "      <td>-0.095212</td>\n",
       "      <td>0.321914</td>\n",
       "      <td>-2.342038</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.605074</td>\n",
       "      <td>-4.371259</td>\n",
       "      <td>-2.335491</td>\n",
       "      <td>-3.988290</td>\n",
       "      <td>-1.118957</td>\n",
       "      <td>-4.342293</td>\n",
       "      <td>-4.559014</td>\n",
       "      <td>0.728169</td>\n",
       "      <td>-0.609320</td>\n",
       "      <td>-3.262348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Novel</th>\n",
       "      <td>0.298461</td>\n",
       "      <td>0.494692</td>\n",
       "      <td>0.200954</td>\n",
       "      <td>-0.011152</td>\n",
       "      <td>0.549282</td>\n",
       "      <td>0.623512</td>\n",
       "      <td>0.595649</td>\n",
       "      <td>0.910913</td>\n",
       "      <td>0.423314</td>\n",
       "      <td>-1.383259</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.806427</td>\n",
       "      <td>-0.997143</td>\n",
       "      <td>0.698427</td>\n",
       "      <td>-0.771234</td>\n",
       "      <td>-0.333310</td>\n",
       "      <td>-2.204803</td>\n",
       "      <td>-3.078950</td>\n",
       "      <td>0.921370</td>\n",
       "      <td>0.711343</td>\n",
       "      <td>0.071245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Political</th>\n",
       "      <td>2.022527</td>\n",
       "      <td>-0.045063</td>\n",
       "      <td>-2.040077</td>\n",
       "      <td>1.845372</td>\n",
       "      <td>1.366772</td>\n",
       "      <td>-2.819496</td>\n",
       "      <td>0.855458</td>\n",
       "      <td>-2.499792</td>\n",
       "      <td>1.129610</td>\n",
       "      <td>-1.893787</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.483921</td>\n",
       "      <td>-1.540766</td>\n",
       "      <td>-0.280696</td>\n",
       "      <td>0.272348</td>\n",
       "      <td>-1.380026</td>\n",
       "      <td>-1.981697</td>\n",
       "      <td>-2.217211</td>\n",
       "      <td>1.202613</td>\n",
       "      <td>-2.108797</td>\n",
       "      <td>-1.540723</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Psychological</th>\n",
       "      <td>0.273890</td>\n",
       "      <td>0.759166</td>\n",
       "      <td>0.313571</td>\n",
       "      <td>-0.164644</td>\n",
       "      <td>1.565642</td>\n",
       "      <td>0.342645</td>\n",
       "      <td>0.071484</td>\n",
       "      <td>0.120266</td>\n",
       "      <td>0.127164</td>\n",
       "      <td>-4.568209</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.639832</td>\n",
       "      <td>-4.183314</td>\n",
       "      <td>0.755981</td>\n",
       "      <td>-4.100278</td>\n",
       "      <td>-1.079347</td>\n",
       "      <td>-4.361342</td>\n",
       "      <td>-4.442837</td>\n",
       "      <td>0.823388</td>\n",
       "      <td>-0.466791</td>\n",
       "      <td>-3.414557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SF</th>\n",
       "      <td>0.999594</td>\n",
       "      <td>-1.056902</td>\n",
       "      <td>-2.721311</td>\n",
       "      <td>-1.934659</td>\n",
       "      <td>-4.521642</td>\n",
       "      <td>1.119701</td>\n",
       "      <td>-1.909738</td>\n",
       "      <td>1.006739</td>\n",
       "      <td>-0.029564</td>\n",
       "      <td>-2.488945</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.293161</td>\n",
       "      <td>-2.652264</td>\n",
       "      <td>-0.978012</td>\n",
       "      <td>4.050927</td>\n",
       "      <td>4.230921</td>\n",
       "      <td>-2.961598</td>\n",
       "      <td>-3.237803</td>\n",
       "      <td>0.181503</td>\n",
       "      <td>0.686332</td>\n",
       "      <td>-2.099872</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Short stories</th>\n",
       "      <td>-4.082887</td>\n",
       "      <td>-4.822782</td>\n",
       "      <td>-3.903740</td>\n",
       "      <td>-3.422832</td>\n",
       "      <td>-2.103334</td>\n",
       "      <td>0.277950</td>\n",
       "      <td>-5.465095</td>\n",
       "      <td>1.369402</td>\n",
       "      <td>-4.697734</td>\n",
       "      <td>-2.311500</td>\n",
       "      <td>...</td>\n",
       "      <td>0.243638</td>\n",
       "      <td>-4.420686</td>\n",
       "      <td>-0.726940</td>\n",
       "      <td>1.482708</td>\n",
       "      <td>1.186216</td>\n",
       "      <td>3.204021</td>\n",
       "      <td>3.167583</td>\n",
       "      <td>-5.559435</td>\n",
       "      <td>-1.400205</td>\n",
       "      <td>-0.586602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Detective</th>\n",
       "      <td>-2.892448</td>\n",
       "      <td>-3.486243</td>\n",
       "      <td>-2.341198</td>\n",
       "      <td>-1.954026</td>\n",
       "      <td>-4.569311</td>\n",
       "      <td>-3.193520</td>\n",
       "      <td>0.057454</td>\n",
       "      <td>-2.430884</td>\n",
       "      <td>0.013604</td>\n",
       "      <td>0.784055</td>\n",
       "      <td>...</td>\n",
       "      <td>0.645024</td>\n",
       "      <td>0.991871</td>\n",
       "      <td>0.049671</td>\n",
       "      <td>0.321690</td>\n",
       "      <td>-2.505647</td>\n",
       "      <td>-0.100738</td>\n",
       "      <td>-3.467710</td>\n",
       "      <td>0.332897</td>\n",
       "      <td>-2.494157</td>\n",
       "      <td>-1.884807</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Fairy tales</th>\n",
       "      <td>-2.583980</td>\n",
       "      <td>-2.724984</td>\n",
       "      <td>-1.546431</td>\n",
       "      <td>-1.014638</td>\n",
       "      <td>-3.597766</td>\n",
       "      <td>-0.164590</td>\n",
       "      <td>-3.142783</td>\n",
       "      <td>-1.773243</td>\n",
       "      <td>-2.415182</td>\n",
       "      <td>1.720737</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.269281</td>\n",
       "      <td>1.730525</td>\n",
       "      <td>-0.308872</td>\n",
       "      <td>0.260321</td>\n",
       "      <td>-1.962307</td>\n",
       "      <td>-0.151694</td>\n",
       "      <td>0.733644</td>\n",
       "      <td>-3.395220</td>\n",
       "      <td>-1.568125</td>\n",
       "      <td>-1.088388</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Fantasy</th>\n",
       "      <td>-2.668297</td>\n",
       "      <td>-3.092663</td>\n",
       "      <td>-1.822580</td>\n",
       "      <td>1.298162</td>\n",
       "      <td>-4.143004</td>\n",
       "      <td>4.067700</td>\n",
       "      <td>-3.673624</td>\n",
       "      <td>0.522599</td>\n",
       "      <td>0.891695</td>\n",
       "      <td>1.887314</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.088838</td>\n",
       "      <td>2.274201</td>\n",
       "      <td>-0.632414</td>\n",
       "      <td>2.641273</td>\n",
       "      <td>2.229587</td>\n",
       "      <td>-0.141393</td>\n",
       "      <td>-3.187113</td>\n",
       "      <td>-0.141829</td>\n",
       "      <td>-1.772265</td>\n",
       "      <td>-1.191326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: History</th>\n",
       "      <td>0.986042</td>\n",
       "      <td>0.469718</td>\n",
       "      <td>1.393442</td>\n",
       "      <td>-2.633287</td>\n",
       "      <td>0.130039</td>\n",
       "      <td>0.009999</td>\n",
       "      <td>2.966758</td>\n",
       "      <td>0.567466</td>\n",
       "      <td>-1.478546</td>\n",
       "      <td>0.174997</td>\n",
       "      <td>...</td>\n",
       "      <td>0.765809</td>\n",
       "      <td>0.730477</td>\n",
       "      <td>-0.957024</td>\n",
       "      <td>-1.258122</td>\n",
       "      <td>-0.789173</td>\n",
       "      <td>-3.982058</td>\n",
       "      <td>-1.568918</td>\n",
       "      <td>-0.659166</td>\n",
       "      <td>2.374608</td>\n",
       "      <td>0.596812</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Horror</th>\n",
       "      <td>-2.832677</td>\n",
       "      <td>-3.208027</td>\n",
       "      <td>-1.930188</td>\n",
       "      <td>-1.336075</td>\n",
       "      <td>-4.170821</td>\n",
       "      <td>1.950234</td>\n",
       "      <td>-3.862299</td>\n",
       "      <td>4.380642</td>\n",
       "      <td>-3.058112</td>\n",
       "      <td>-3.007224</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.891002</td>\n",
       "      <td>-0.161921</td>\n",
       "      <td>-0.171874</td>\n",
       "      <td>0.930632</td>\n",
       "      <td>0.976890</td>\n",
       "      <td>0.536694</td>\n",
       "      <td>-0.729069</td>\n",
       "      <td>-0.759351</td>\n",
       "      <td>-2.061232</td>\n",
       "      <td>1.087024</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Humor</th>\n",
       "      <td>-2.400227</td>\n",
       "      <td>-2.928306</td>\n",
       "      <td>-1.756554</td>\n",
       "      <td>-1.407832</td>\n",
       "      <td>-3.603519</td>\n",
       "      <td>-2.528873</td>\n",
       "      <td>-3.201279</td>\n",
       "      <td>-2.001259</td>\n",
       "      <td>4.089647</td>\n",
       "      <td>-3.941342</td>\n",
       "      <td>...</td>\n",
       "      <td>4.670883</td>\n",
       "      <td>-0.985015</td>\n",
       "      <td>0.746169</td>\n",
       "      <td>1.020848</td>\n",
       "      <td>-1.939398</td>\n",
       "      <td>1.151453</td>\n",
       "      <td>-2.434799</td>\n",
       "      <td>-3.332200</td>\n",
       "      <td>-1.513113</td>\n",
       "      <td>-1.387125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Juvenile</th>\n",
       "      <td>0.936398</td>\n",
       "      <td>-3.135787</td>\n",
       "      <td>-1.443649</td>\n",
       "      <td>-1.033527</td>\n",
       "      <td>-4.332381</td>\n",
       "      <td>-2.494904</td>\n",
       "      <td>-3.927214</td>\n",
       "      <td>-1.756085</td>\n",
       "      <td>-0.587908</td>\n",
       "      <td>3.026684</td>\n",
       "      <td>...</td>\n",
       "      <td>-0.985015</td>\n",
       "      <td>3.683932</td>\n",
       "      <td>-0.648682</td>\n",
       "      <td>-2.590884</td>\n",
       "      <td>-2.358327</td>\n",
       "      <td>-0.907437</td>\n",
       "      <td>-3.553155</td>\n",
       "      <td>-3.538587</td>\n",
       "      <td>-1.358513</td>\n",
       "      <td>-1.871230</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Man-woman</th>\n",
       "      <td>0.441526</td>\n",
       "      <td>0.821285</td>\n",
       "      <td>-2.944366</td>\n",
       "      <td>1.248363</td>\n",
       "      <td>1.263024</td>\n",
       "      <td>0.011914</td>\n",
       "      <td>0.290489</td>\n",
       "      <td>-3.296209</td>\n",
       "      <td>1.357892</td>\n",
       "      <td>-1.469373</td>\n",
       "      <td>...</td>\n",
       "      <td>0.746169</td>\n",
       "      <td>-0.648682</td>\n",
       "      <td>4.044184</td>\n",
       "      <td>-3.141020</td>\n",
       "      <td>-2.378338</td>\n",
       "      <td>-1.049883</td>\n",
       "      <td>-3.689196</td>\n",
       "      <td>-0.912569</td>\n",
       "      <td>-0.342573</td>\n",
       "      <td>1.165637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: SF, American</th>\n",
       "      <td>-2.845089</td>\n",
       "      <td>-3.273126</td>\n",
       "      <td>-2.256043</td>\n",
       "      <td>-1.256016</td>\n",
       "      <td>-3.971152</td>\n",
       "      <td>-0.423531</td>\n",
       "      <td>-3.801247</td>\n",
       "      <td>-2.493054</td>\n",
       "      <td>-2.979538</td>\n",
       "      <td>-2.882120</td>\n",
       "      <td>...</td>\n",
       "      <td>1.020848</td>\n",
       "      <td>-2.590884</td>\n",
       "      <td>-3.141020</td>\n",
       "      <td>4.425853</td>\n",
       "      <td>4.175940</td>\n",
       "      <td>-0.385268</td>\n",
       "      <td>-3.567824</td>\n",
       "      <td>-3.843450</td>\n",
       "      <td>-2.073243</td>\n",
       "      <td>-1.598127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: SF, Other</th>\n",
       "      <td>1.117171</td>\n",
       "      <td>-2.619720</td>\n",
       "      <td>-1.651594</td>\n",
       "      <td>-1.016478</td>\n",
       "      <td>-3.427581</td>\n",
       "      <td>0.611025</td>\n",
       "      <td>-3.134623</td>\n",
       "      <td>-1.911615</td>\n",
       "      <td>-2.398768</td>\n",
       "      <td>-2.457515</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.939398</td>\n",
       "      <td>-2.358327</td>\n",
       "      <td>-2.378338</td>\n",
       "      <td>4.175940</td>\n",
       "      <td>4.788612</td>\n",
       "      <td>-2.198735</td>\n",
       "      <td>-3.023396</td>\n",
       "      <td>-3.600603</td>\n",
       "      <td>-1.683120</td>\n",
       "      <td>-0.990152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Short stories, American</th>\n",
       "      <td>-2.666464</td>\n",
       "      <td>-3.338657</td>\n",
       "      <td>-2.421566</td>\n",
       "      <td>-1.682466</td>\n",
       "      <td>-1.001827</td>\n",
       "      <td>-3.223219</td>\n",
       "      <td>-4.045185</td>\n",
       "      <td>-2.608324</td>\n",
       "      <td>-3.169652</td>\n",
       "      <td>-3.453150</td>\n",
       "      <td>...</td>\n",
       "      <td>1.151453</td>\n",
       "      <td>-0.907437</td>\n",
       "      <td>-1.049883</td>\n",
       "      <td>-0.385268</td>\n",
       "      <td>-2.198735</td>\n",
       "      <td>4.760188</td>\n",
       "      <td>3.721933</td>\n",
       "      <td>-3.916583</td>\n",
       "      <td>-2.030850</td>\n",
       "      <td>-1.365542</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subj: Short stories, Other</th>\n",
       "      <td>-3.010117</td>\n",
       "      <td>-3.604928</td>\n",
       "      <td>-2.428072</td>\n",
       "      <td>-1.673172</td>\n",
       "      <td>-4.536183</td>\n",
       "      <td>-3.228188</td>\n",
       "      <td>-4.163451</td>\n",
       "      <td>-0.432951</td>\n",
       "      <td>-3.294340</td>\n",
       "      <td>-3.962527</td>\n",
       "      <td>...</td>\n",
       "      <td>-2.434799</td>\n",
       "      <td>-3.553155</td>\n",
       "      <td>-3.689196</td>\n",
       "      <td>-3.567824</td>\n",
       "      <td>-3.023396</td>\n",
       "      <td>3.721933</td>\n",
       "      <td>4.178335</td>\n",
       "      <td>-4.002770</td>\n",
       "      <td>-2.048282</td>\n",
       "      <td>-1.748679</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Suspense</th>\n",
       "      <td>-1.688237</td>\n",
       "      <td>-0.693552</td>\n",
       "      <td>-3.985702</td>\n",
       "      <td>-0.660272</td>\n",
       "      <td>-0.428196</td>\n",
       "      <td>0.249198</td>\n",
       "      <td>0.117043</td>\n",
       "      <td>0.245167</td>\n",
       "      <td>-0.207172</td>\n",
       "      <td>-3.733282</td>\n",
       "      <td>...</td>\n",
       "      <td>-3.332200</td>\n",
       "      <td>-3.538587</td>\n",
       "      <td>-0.912569</td>\n",
       "      <td>-3.843450</td>\n",
       "      <td>-3.600603</td>\n",
       "      <td>-3.916583</td>\n",
       "      <td>-4.002770</td>\n",
       "      <td>3.186312</td>\n",
       "      <td>-3.620935</td>\n",
       "      <td>-2.896259</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>War</th>\n",
       "      <td>0.795667</td>\n",
       "      <td>2.265967</td>\n",
       "      <td>1.844265</td>\n",
       "      <td>-1.463414</td>\n",
       "      <td>0.661536</td>\n",
       "      <td>-0.496345</td>\n",
       "      <td>2.682408</td>\n",
       "      <td>-2.487540</td>\n",
       "      <td>-2.925803</td>\n",
       "      <td>0.312924</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.513113</td>\n",
       "      <td>-1.358513</td>\n",
       "      <td>-0.342573</td>\n",
       "      <td>-2.073243</td>\n",
       "      <td>-1.683120</td>\n",
       "      <td>-2.030850</td>\n",
       "      <td>-2.048282</td>\n",
       "      <td>-3.620935</td>\n",
       "      <td>4.984383</td>\n",
       "      <td>-1.265709</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Western</th>\n",
       "      <td>-2.159260</td>\n",
       "      <td>0.977699</td>\n",
       "      <td>2.425606</td>\n",
       "      <td>-0.668447</td>\n",
       "      <td>0.218978</td>\n",
       "      <td>-2.330399</td>\n",
       "      <td>1.667594</td>\n",
       "      <td>0.969952</td>\n",
       "      <td>-2.301749</td>\n",
       "      <td>-1.991889</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.387125</td>\n",
       "      <td>-1.871230</td>\n",
       "      <td>1.165637</td>\n",
       "      <td>-1.598127</td>\n",
       "      <td>-0.990152</td>\n",
       "      <td>-1.365542</td>\n",
       "      <td>-1.748679</td>\n",
       "      <td>-2.896259</td>\n",
       "      <td>-1.265709</td>\n",
       "      <td>5.571702</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>32 rows × 32 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                               Adventure  Bildungsroman  Biographical  \\\n",
       "Adventure                       4.279903       0.435610      0.343080   \n",
       "Bildungsroman                   0.435610       4.053356     -3.165622   \n",
       "Biographical                    0.343080      -3.165622      4.916599   \n",
       "Christian                       0.221690       1.131953      3.110273   \n",
       "Domestic                        0.305965       1.720973      0.009942   \n",
       "Fantasy                         0.588636      -0.694249     -2.894851   \n",
       "Historical                      1.379508       1.093455      2.114180   \n",
       "Horror                         -3.243798      -0.328167     -2.517358   \n",
       "Humor                           1.227815       1.701955     -2.962145   \n",
       "Juvenile                        1.130951      -3.335126     -1.474224   \n",
       "Love                            0.454138       0.840889      1.066856   \n",
       "Mystery                         0.094113      -2.462436     -3.930831   \n",
       "Novel                           0.298461       0.494692      0.200954   \n",
       "Political                       2.022527      -0.045063     -2.040077   \n",
       "Psychological                   0.273890       0.759166      0.313571   \n",
       "SF                              0.999594      -1.056902     -2.721311   \n",
       "Short stories                  -4.082887      -4.822782     -3.903740   \n",
       "Subj: Detective                -2.892448      -3.486243     -2.341198   \n",
       "Subj: Fairy tales              -2.583980      -2.724984     -1.546431   \n",
       "Subj: Fantasy                  -2.668297      -3.092663     -1.822580   \n",
       "Subj: History                   0.986042       0.469718      1.393442   \n",
       "Subj: Horror                   -2.832677      -3.208027     -1.930188   \n",
       "Subj: Humor                    -2.400227      -2.928306     -1.756554   \n",
       "Subj: Juvenile                  0.936398      -3.135787     -1.443649   \n",
       "Subj: Man-woman                 0.441526       0.821285     -2.944366   \n",
       "Subj: SF, American             -2.845089      -3.273126     -2.256043   \n",
       "Subj: SF, Other                 1.117171      -2.619720     -1.651594   \n",
       "Subj: Short stories, American  -2.666464      -3.338657     -2.421566   \n",
       "Subj: Short stories, Other     -3.010117      -3.604928     -2.428072   \n",
       "Suspense                       -1.688237      -0.693552     -3.985702   \n",
       "War                             0.795667       2.265967      1.844265   \n",
       "Western                        -2.159260       0.977699      2.425606   \n",
       "\n",
       "                               Christian  Domestic   Fantasy  Historical  \\\n",
       "Adventure                       0.221690  0.305965  0.588636    1.379508   \n",
       "Bildungsroman                   1.131953  1.720973 -0.694249    1.093455   \n",
       "Biographical                    3.110273  0.009942 -2.894851    2.114180   \n",
       "Christian                       5.416200  1.049163  2.156130    1.000321   \n",
       "Domestic                        1.049163  3.189220 -2.378864    1.054664   \n",
       "Fantasy                         2.156130 -2.378864  4.234711    0.909377   \n",
       "Historical                      1.000321  1.054664  0.909377    3.489242   \n",
       "Horror                         -2.123703 -1.374636  2.179593    0.085551   \n",
       "Humor                          -2.363184  0.861812  0.245513   -1.092102   \n",
       "Juvenile                       -1.696563 -4.720872 -0.639921   -1.965720   \n",
       "Love                            2.012803  0.651874  0.296712    0.995560   \n",
       "Mystery                        -0.743228 -1.086038 -0.131421    0.971470   \n",
       "Novel                          -0.011152  0.549282  0.623512    0.595649   \n",
       "Political                       1.845372  1.366772 -2.819496    0.855458   \n",
       "Psychological                  -0.164644  1.565642  0.342645    0.071484   \n",
       "SF                             -1.934659 -4.521642  1.119701   -1.909738   \n",
       "Short stories                  -3.422832 -2.103334  0.277950   -5.465095   \n",
       "Subj: Detective                -1.954026 -4.569311 -3.193520    0.057454   \n",
       "Subj: Fairy tales              -1.014638 -3.597766 -0.164590   -3.142783   \n",
       "Subj: Fantasy                   1.298162 -4.143004  4.067700   -3.673624   \n",
       "Subj: History                  -2.633287  0.130039  0.009999    2.966758   \n",
       "Subj: Horror                   -1.336075 -4.170821  1.950234   -3.862299   \n",
       "Subj: Humor                    -1.407832 -3.603519 -2.528873   -3.201279   \n",
       "Subj: Juvenile                 -1.033527 -4.332381 -2.494904   -3.927214   \n",
       "Subj: Man-woman                 1.248363  1.263024  0.011914    0.290489   \n",
       "Subj: SF, American             -1.256016 -3.971152 -0.423531   -3.801247   \n",
       "Subj: SF, Other                -1.016478 -3.427581  0.611025   -3.134623   \n",
       "Subj: Short stories, American  -1.682466 -1.001827 -3.223219   -4.045185   \n",
       "Subj: Short stories, Other     -1.673172 -4.536183 -3.228188   -4.163451   \n",
       "Suspense                       -0.660272 -0.428196  0.249198    0.117043   \n",
       "War                            -1.463414  0.661536 -0.496345    2.682408   \n",
       "Western                        -0.668447  0.218978 -2.330399    1.667594   \n",
       "\n",
       "                                 Horror     Humor  Juvenile    ...     \\\n",
       "Adventure                     -3.243798  1.227815  1.130951    ...      \n",
       "Bildungsroman                 -0.328167  1.701955 -3.335126    ...      \n",
       "Biographical                  -2.517358 -2.962145 -1.474224    ...      \n",
       "Christian                     -2.123703 -2.363184 -1.696563    ...      \n",
       "Domestic                      -1.374636  0.861812 -4.720872    ...      \n",
       "Fantasy                        2.179593  0.245513 -0.639921    ...      \n",
       "Historical                     0.085551 -1.092102 -1.965720    ...      \n",
       "Horror                         4.612799 -3.410408 -2.254403    ...      \n",
       "Humor                         -3.410408  4.089647 -0.521751    ...      \n",
       "Juvenile                      -2.254403 -0.521751  3.087760    ...      \n",
       "Love                          -0.121174  1.410084 -4.539916    ...      \n",
       "Mystery                       -0.095212  0.321914 -2.342038    ...      \n",
       "Novel                          0.910913  0.423314 -1.383259    ...      \n",
       "Political                     -2.499792  1.129610 -1.893787    ...      \n",
       "Psychological                  0.120266  0.127164 -4.568209    ...      \n",
       "SF                             1.006739 -0.029564 -2.488945    ...      \n",
       "Short stories                  1.369402 -4.697734 -2.311500    ...      \n",
       "Subj: Detective               -2.430884  0.013604  0.784055    ...      \n",
       "Subj: Fairy tales             -1.773243 -2.415182  1.720737    ...      \n",
       "Subj: Fantasy                  0.522599  0.891695  1.887314    ...      \n",
       "Subj: History                  0.567466 -1.478546  0.174997    ...      \n",
       "Subj: Horror                   4.380642 -3.058112 -3.007224    ...      \n",
       "Subj: Humor                   -2.001259  4.089647 -3.941342    ...      \n",
       "Subj: Juvenile                -1.756085 -0.587908  3.026684    ...      \n",
       "Subj: Man-woman               -3.296209  1.357892 -1.469373    ...      \n",
       "Subj: SF, American            -2.493054 -2.979538 -2.882120    ...      \n",
       "Subj: SF, Other               -1.911615 -2.398768 -2.457515    ...      \n",
       "Subj: Short stories, American -2.608324 -3.169652 -3.453150    ...      \n",
       "Subj: Short stories, Other    -0.432951 -3.294340 -3.962527    ...      \n",
       "Suspense                       0.245167 -0.207172 -3.733282    ...      \n",
       "War                           -2.487540 -2.925803  0.312924    ...      \n",
       "Western                        0.969952 -2.301749 -1.991889    ...      \n",
       "\n",
       "                               Subj: Humor  Subj: Juvenile  Subj: Man-woman  \\\n",
       "Adventure                        -2.400227        0.936398         0.441526   \n",
       "Bildungsroman                    -2.928306       -3.135787         0.821285   \n",
       "Biographical                     -1.756554       -1.443649        -2.944366   \n",
       "Christian                        -1.407832       -1.033527         1.248363   \n",
       "Domestic                         -3.603519       -4.332381         1.263024   \n",
       "Fantasy                          -2.528873       -2.494904         0.011914   \n",
       "Historical                       -3.201279       -3.927214         0.290489   \n",
       "Horror                           -2.001259       -1.756085        -3.296209   \n",
       "Humor                             4.089647       -0.587908         1.357892   \n",
       "Juvenile                         -3.941342        3.026684        -1.469373   \n",
       "Love                             -1.133929       -4.039011         3.335897   \n",
       "Mystery                          -3.605074       -4.371259        -2.335491   \n",
       "Novel                            -0.806427       -0.997143         0.698427   \n",
       "Political                        -1.483921       -1.540766        -0.280696   \n",
       "Psychological                    -3.639832       -4.183314         0.755981   \n",
       "SF                               -2.293161       -2.652264        -0.978012   \n",
       "Short stories                     0.243638       -4.420686        -0.726940   \n",
       "Subj: Detective                   0.645024        0.991871         0.049671   \n",
       "Subj: Fairy tales                -2.269281        1.730525        -0.308872   \n",
       "Subj: Fantasy                    -2.088838        2.274201        -0.632414   \n",
       "Subj: History                     0.765809        0.730477        -0.957024   \n",
       "Subj: Horror                     -1.891002       -0.161921        -0.171874   \n",
       "Subj: Humor                       4.670883       -0.985015         0.746169   \n",
       "Subj: Juvenile                   -0.985015        3.683932        -0.648682   \n",
       "Subj: Man-woman                   0.746169       -0.648682         4.044184   \n",
       "Subj: SF, American                1.020848       -2.590884        -3.141020   \n",
       "Subj: SF, Other                  -1.939398       -2.358327        -2.378338   \n",
       "Subj: Short stories, American     1.151453       -0.907437        -1.049883   \n",
       "Subj: Short stories, Other       -2.434799       -3.553155        -3.689196   \n",
       "Suspense                         -3.332200       -3.538587        -0.912569   \n",
       "War                              -1.513113       -1.358513        -0.342573   \n",
       "Western                          -1.387125       -1.871230         1.165637   \n",
       "\n",
       "                               Subj: SF, American  Subj: SF, Other  \\\n",
       "Adventure                               -2.845089         1.117171   \n",
       "Bildungsroman                           -3.273126        -2.619720   \n",
       "Biographical                            -2.256043        -1.651594   \n",
       "Christian                               -1.256016        -1.016478   \n",
       "Domestic                                -3.971152        -3.427581   \n",
       "Fantasy                                 -0.423531         0.611025   \n",
       "Historical                              -3.801247        -3.134623   \n",
       "Horror                                  -2.493054        -1.911615   \n",
       "Humor                                   -2.979538        -2.398768   \n",
       "Juvenile                                -2.882120        -2.457515   \n",
       "Love                                    -3.661264        -2.987632   \n",
       "Mystery                                 -3.988290        -1.118957   \n",
       "Novel                                   -0.771234        -0.333310   \n",
       "Political                                0.272348        -1.380026   \n",
       "Psychological                           -4.100278        -1.079347   \n",
       "SF                                       4.050927         4.230921   \n",
       "Short stories                            1.482708         1.186216   \n",
       "Subj: Detective                          0.321690        -2.505647   \n",
       "Subj: Fairy tales                        0.260321        -1.962307   \n",
       "Subj: Fantasy                            2.641273         2.229587   \n",
       "Subj: History                           -1.258122        -0.789173   \n",
       "Subj: Horror                             0.930632         0.976890   \n",
       "Subj: Humor                              1.020848        -1.939398   \n",
       "Subj: Juvenile                          -2.590884        -2.358327   \n",
       "Subj: Man-woman                         -3.141020        -2.378338   \n",
       "Subj: SF, American                       4.425853         4.175940   \n",
       "Subj: SF, Other                          4.175940         4.788612   \n",
       "Subj: Short stories, American           -0.385268        -2.198735   \n",
       "Subj: Short stories, Other              -3.567824        -3.023396   \n",
       "Suspense                                -3.843450        -3.600603   \n",
       "War                                     -2.073243        -1.683120   \n",
       "Western                                 -1.598127        -0.990152   \n",
       "\n",
       "                               Subj: Short stories, American  \\\n",
       "Adventure                                          -2.666464   \n",
       "Bildungsroman                                      -3.338657   \n",
       "Biographical                                       -2.421566   \n",
       "Christian                                          -1.682466   \n",
       "Domestic                                           -1.001827   \n",
       "Fantasy                                            -3.223219   \n",
       "Historical                                         -4.045185   \n",
       "Horror                                             -2.608324   \n",
       "Humor                                              -3.169652   \n",
       "Juvenile                                           -3.453150   \n",
       "Love                                               -1.668833   \n",
       "Mystery                                            -4.342293   \n",
       "Novel                                              -2.204803   \n",
       "Political                                          -1.981697   \n",
       "Psychological                                      -4.361342   \n",
       "SF                                                 -2.961598   \n",
       "Short stories                                       3.204021   \n",
       "Subj: Detective                                    -0.100738   \n",
       "Subj: Fairy tales                                  -0.151694   \n",
       "Subj: Fantasy                                      -0.141393   \n",
       "Subj: History                                      -3.982058   \n",
       "Subj: Horror                                        0.536694   \n",
       "Subj: Humor                                         1.151453   \n",
       "Subj: Juvenile                                     -0.907437   \n",
       "Subj: Man-woman                                    -1.049883   \n",
       "Subj: SF, American                                 -0.385268   \n",
       "Subj: SF, Other                                    -2.198735   \n",
       "Subj: Short stories, American                       4.760188   \n",
       "Subj: Short stories, Other                          3.721933   \n",
       "Suspense                                           -3.916583   \n",
       "War                                                -2.030850   \n",
       "Western                                            -1.365542   \n",
       "\n",
       "                               Subj: Short stories, Other  Suspense       War  \\\n",
       "Adventure                                       -3.010117 -1.688237  0.795667   \n",
       "Bildungsroman                                   -3.604928 -0.693552  2.265967   \n",
       "Biographical                                    -2.428072 -3.985702  1.844265   \n",
       "Christian                                       -1.673172 -0.660272 -1.463414   \n",
       "Domestic                                        -4.536183 -0.428196  0.661536   \n",
       "Fantasy                                         -3.228188  0.249198 -0.496345   \n",
       "Historical                                      -4.163451  0.117043  2.682408   \n",
       "Horror                                          -0.432951  0.245167 -2.487540   \n",
       "Humor                                           -3.294340 -0.207172 -2.925803   \n",
       "Juvenile                                        -3.962527 -3.733282  0.312924   \n",
       "Love                                            -4.232899 -1.385372  1.073255   \n",
       "Mystery                                         -4.559014  0.728169 -0.609320   \n",
       "Novel                                           -3.078950  0.921370  0.711343   \n",
       "Political                                       -2.217211  1.202613 -2.108797   \n",
       "Psychological                                   -4.442837  0.823388 -0.466791   \n",
       "SF                                              -3.237803  0.181503  0.686332   \n",
       "Short stories                                    3.167583 -5.559435 -1.400205   \n",
       "Subj: Detective                                 -3.467710  0.332897 -2.494157   \n",
       "Subj: Fairy tales                                0.733644 -3.395220 -1.568125   \n",
       "Subj: Fantasy                                   -3.187113 -0.141829 -1.772265   \n",
       "Subj: History                                   -1.568918 -0.659166  2.374608   \n",
       "Subj: Horror                                    -0.729069 -0.759351 -2.061232   \n",
       "Subj: Humor                                     -2.434799 -3.332200 -1.513113   \n",
       "Subj: Juvenile                                  -3.553155 -3.538587 -1.358513   \n",
       "Subj: Man-woman                                 -3.689196 -0.912569 -0.342573   \n",
       "Subj: SF, American                              -3.567824 -3.843450 -2.073243   \n",
       "Subj: SF, Other                                 -3.023396 -3.600603 -1.683120   \n",
       "Subj: Short stories, American                    3.721933 -3.916583 -2.030850   \n",
       "Subj: Short stories, Other                       4.178335 -4.002770 -2.048282   \n",
       "Suspense                                        -4.002770  3.186312 -3.620935   \n",
       "War                                             -2.048282 -3.620935  4.984383   \n",
       "Western                                         -1.748679 -2.896259 -1.265709   \n",
       "\n",
       "                                Western  \n",
       "Adventure                     -2.159260  \n",
       "Bildungsroman                  0.977699  \n",
       "Biographical                   2.425606  \n",
       "Christian                     -0.668447  \n",
       "Domestic                       0.218978  \n",
       "Fantasy                       -2.330399  \n",
       "Historical                     1.667594  \n",
       "Horror                         0.969952  \n",
       "Humor                         -2.301749  \n",
       "Juvenile                      -1.991889  \n",
       "Love                           0.930612  \n",
       "Mystery                       -3.262348  \n",
       "Novel                          0.071245  \n",
       "Political                     -1.540723  \n",
       "Psychological                 -3.414557  \n",
       "SF                            -2.099872  \n",
       "Short stories                 -0.586602  \n",
       "Subj: Detective               -1.884807  \n",
       "Subj: Fairy tales             -1.088388  \n",
       "Subj: Fantasy                 -1.191326  \n",
       "Subj: History                  0.596812  \n",
       "Subj: Horror                   1.087024  \n",
       "Subj: Humor                   -1.387125  \n",
       "Subj: Juvenile                -1.871230  \n",
       "Subj: Man-woman                1.165637  \n",
       "Subj: SF, American            -1.598127  \n",
       "Subj: SF, Other               -0.990152  \n",
       "Subj: Short stories, American -1.365542  \n",
       "Subj: Short stories, Other    -1.748679  \n",
       "Suspense                      -2.896259  \n",
       "War                           -1.265709  \n",
       "Western                        5.571702  \n",
       "\n",
       "[32 rows x 32 columns]"
      ]
     },
     "execution_count": 121,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pmidf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 133,
   "metadata": {},
   "outputs": [],
   "source": [
    "pmidf.to_csv('../socialmeasures/pmidf.csv', index_label = 'index')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
